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ABSTRACT 


The U. S. Armv uses cash selective reenlistment bonuses (SRB) to encourage sol- 
diers in selected military occupation specialities (MOS) to reenlist. Estimates of the 
reenlistment rate as a function of bonus level are needed for each MOS as input to a 
bonus allocation model. This thesis outlines and uses a new method for predicting the 
reenlistment rates as a function of bonus level. 

The approach involves partitioning the soldier population into cells with stable 
reenlistment rates using demographic variables. The cells are aggregated using clustering 
techniques to produce groups of cells which exhibit homogeneity of reenlistment be- 
havior. Regression models are developed for each group of cells. MOS reenlistment 
rates are determined as a linear combination across cells. Cross-validation techniques 
are used to lend credibility to the predictive model. 

The study points out the usefulness of tdentifving categories of soldiers who display 
unique reenlistment behavior. Integration of this technique with existing econometric 


reenlistment models 1s recommended to further improve the predictive model. 
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I. INTRODUCTION 


A. GENERAL 

Retaining gualified soldiers in the militarv after their terms of service are complete 
continues to be one of the key issues in the all-volunteer Army. Reenlisting good sol- 
diers protects the militarv's extensive investment in training, and provides the stream of 
soldiers needed for leadership and supervisory positions. Reenlistments are also a pow- 
erful force alignment tool for the Army to balance job skills and grade structure. Al- 
though there are many ways for personnel managers to influence reenhstment behavior, 
the reenlistment cash bonus continues to be the most powerful and responsive tool 
available. 

The United States military has utilized reenlistment bonuses since the carlv 1960's 
to improve retention in the services. Since 1974, however. the reenlistnent bonuses have 
been selective’. targeted at specially designated nulitary job skills. To assist militarv 
personnel managers in determining Which job skills should receive reenlistment bonuses. 
a large-scale optimization model was developed and refined at the Naval Postgraduate 
School [Ref. 1: pp. 1-3]. This mathematical model recommends a set of bonuses that 
attempts to minimize the expected deviation from a desired force structure under the 
constraint of a given budget. A brief description of this military reenlistment bonus 
model is in Appendix A. 

Use of the mulitarv reenlistment bonus model by the U. S. Army 1s currently limited 
because of the inadequacy of one of the model inputs. the predicted reenlistment rates. 
These rates estimate the number of soldiers who will reenlist for each different Job skill 
at each potential bonus level.! The military reenlistment bonus model uses these as in- 
puts to determine the most effective method to spend the limited bonus budget. 

The purpose of this study 1s to develop a model to estimate the reenlistment bonus 
response rates for U. S. Army enlisted personnel for use in the military reenlistment bo- 


nus model. 


l It is important to understand that bonuses are a treatment, whose effect on the soldier pop- 
ulation is uncertain. 


B. BACKGROUND 

Reenlistment cash bonuses are executed in the U. S. militarv through thc selective 
reenlisument bonus (SRB) program. The “selective” bonuses are targeted at specially 
designated military occupation specialities (MOS) and vear-of-service mterval (zonc) 
combinations. The L. S. Army currently has over 350 different MOS's. Year-of-service 


intervals are broken into three zones as follows: 


LONCA 2-6 vears-of-service 
Zone B 6-10 vcars-of-service 
Zone 10-14 vears-of-service 2 


MOS and zone combinations are called cells, and there are over 1000 cells to which 
the military reenlistment bonus model assigns bonus multipliers. The cash amount of a 
bonus is computed as follows in Equation 1. where SRB is the cash bonus amount, 
MBP is the soldier's current monthly base pay, YR is the number of years the socrii 


reenlists for, and J/ULT,, 1s the bonus muluplier for MOS 7 and zone /. 


SRB= MBP x RX PELI, (1) 


One half of the bonus is paid as a lump sum on the dav the soldier reenlists. The re- 
mainder is paid in equal vearly installments over the duration of the reenlistment term. 
Bonus multipliers range between zero and six., and although public law allows them to 
take on continuous values, the Armv restricts them to increments of 0.5. At anv given 
time, 15-25% of the 1000 cells have non-zero bonus multipliers, and the Army’s vearly 
budget for the bonus program is from $50-100 million. 

The L. S. Army is currently experimenting by allowing bonus multipliers to vary by 
rank within an MOS and zone combination. For example, an infantryman im Zone A 
who achieves the rank of sergeant could receive a higher bonus than soldier of the rank 
of specialist, a lower rank.3 The purpose is to encourage more high quality soldiers to 
reenlist.4 This experiment causes the bonus multiplier to have three dimensions, 


(MULT,,,) of MOS, zone, and rank. While this study does not address the issue of 


i 


2 Soldiers with under two or over fourteen years-of-service are not eligible for reenlistment 
bonuses. Zone A is extended sightly, to allow soldiers who enlist for two years an opportunity to 
reenlist pnor to the end of their service term. 


3 The rank of sergeant 1s pay grade ES. The rank of specialist is pay grade F4. 


4 The assumption 1s that rank is a good measure of soldier quality, an assumption that is used 
in this study. 


to 


rank as a dimension of the bonus multipher, the method outlined here 1s adaptable to 
this approach. 

Soldiers enlist in the militarv by signing a contract that obligates them to specific 
Em Ol senice (usually two to four years). As they near the end of their enlistment 


term, soldiers have available to them the following options: 


REENLISI A. soldier signs a new contract, obligating him or her to 
ae eL O Aa (O IS years. “bonuses dre for 
reenlistments of three vears or more, and the length of 
the reenlistment affects the amount of the bonus pay- 
ment. 


REENLIST/MIGRATE Soldiers also may reenlist. but migrate to a new MOS. 
Normally this is from an overstrength to an under- 
strength MOS. Usually. migrating soldiers do not reccive 
bonuses.5 

EXTEND Extending soldiers defer their reenlistment decision. Ex- 
tensions are for up to two vears. and soldiers do not re- 
dave “OONUSeS ler een dini = ian Soldiers extend 
ben Ul Ee arc currenul Mel le co reeni ts and tlie 
trv to become eligible during the extension period. Other 
soldiers extend to wait for more favorable bonus multi- 
Dine iS i i al o cnd to meet schaolime, tran oe, 
deplovment. overseas asugnnient or retirement time rc- 
Maile i Service 1equinements,. Because ther aie a de- 
ferred reenlistment decisión. extensions are a major 
complicating factor to this study. They are addressed in 
Appendix B. 

ETS Coo a no e e A oer n o does not make ait. 
ce oe deceionso (i charged ronm thie service at 
the end of the contract period. 


Soldiers are allowed to reenlist up to eight months prior to the end of their current 
term of enlistment. Like extensions, this policy also clouds the issue of who is eligible 
to reenlist at anv given time. This issue is also addressed in Appendix B. 

The above discussion serves to highlight a few important aspects of the SRB pro- 


Amor a more detailed overview of the reenlistment system, consult “The Effects of 


Selective Reenlistment Bonuses on Retention.” by Donald J. Cymrot [Ref. 2: pp. 4-9]. 


C. RESEARCH QUESTIONS 
The purpose of this section 1s to provide the motivation for the specific research areas 


that will be pursued during this study. 


5 Migrating soldiers can expect faster promotion rates in their new shortage MOS. 


I. MOS Grouping 
This study is sponsored by the U. S. Total Army Personnel Command, 
Alexandria, Virginia. Their task 1s to develop a model to estimate reenlistment response 
rates for use in the military reenlistment bonus model. A brief review of the input form 
required by the bonus optimization model motivates the approach of the study. Figure 
I shows a graphical example of the put requirement for the nulitarv reenlistment bonus 


model. 


REENLISTMENT RATE 
AS A FUNCTION OF BONUS MULTIPLIER 
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Figure 1. Sample of Input Required for Bonus Model (Hypothetical) 


The military reenlistment bonus model reguires as input a function that takes a specified 
bonus level and outputs the expected reenlistment rate, by MOS.6 

A point to note is that the above example is MOS and zone specific. The bonus 
optimization model requires over 1000 such functions (one for each cell). However, the 


computer resourees are not available to execute the 1000 different regression models 


6 The actual function is input into the military reenlistment bonus model as a point estimate 
for each of the various bonus levels. 


necessarv to develop the 1000 different response functions. The goal of this study ts to 
develop a methodology to reduce the number of regression models, bv some appropriate 
grouping technique. 

A brief review of past attempts at grouping of MOS’s gives some perspective to 
this research question. The first attempts at grouping combined all MOS'S together. 
They estimated one set of reenlistnent response rates for all MOS's. One study taking 
this approach is Enns [Ref. 3: pp. 1-3]. The problem with this approach 1s that there is 
evidence of the varving effects of reenlistment bonuses among MOS’s. The strongest 
evidence of this is found in research by Lakhani and Gilrov [Ref. 4: p. 253]. 

The next attempt was to estimate a separate reenlistment response for each dif- 
rente OS. Un addition to the problem noted above (the requirement for 1000 different 
regression equations), there are a number of additional problems with this approach. 
The first problem 1s that stnce bonuses are allocated by MOS. it follows that all soldiers 
eine same MOS (and zone) receive the same bonus [Ref. 3: p. vi]. This limits the 
number of observations at different bonus levels available for use mn the regression. To 
further complicate this problem. on!v 15-25%0 of the over 1000 cells have non-zero bonus 
WD licrs at anv eiven time. Large numbers of cells never have a bonus. or have such 
a Hmited bonus history that estimation bv regression technigues is meaningless. 

A second problem with estimating a separate reenlistment response rate for each 
MOS ts that bonuses within a specialty often do not change from Year to Year. This is 
SU” the fact that bonuses are often given to critical MOS s. and these MOS s re- 
Daue cal over time. One study by Hosek and Peterson [Ref. 6: pp. 19-22] estimates 
the correlation of bonus levels in adjoining time periods to be 0.8 for specialities receiv- 
ing a bonus. This correlation causes the regression model to behave poorly. 

‘A third problem is that this technique assumes the MOS 1s a homogeneous 
grouping of soldiers with sinular reenlistment probabthties. However in his research, 
Kohler questions this assumption and shows that MOS's arc not homogeneous 
eroupings [Ref. 3: p. 4]. 

To correct for the deficiencies with estumating reenlistment response rates, most 
researchers have grouped MOS’s. The advantage to this approach 1s that by grouping 
Mos with varving bonus levels together, the regression estmates become more 
meaningful. Two basic approaches are used. The first approach 1s to group MOS's into 
career management fields (CMF’s). The Army currently has 32 CMI ’s. Studies using 
this technique include a study of Army reenlistment and extension decisions by Lakhani 


ama Gilro. |Ref 4: p. 2521. The problem with this approach 1s that the CMF’s are ad- 


se 


ministrative groupings, and CMF's often group occupations with little in common [Ref. 
S: p. A]. 

The second approach is to assien MOS's into groups with similar job charac- 
teristics. These characteristics tend to kev on how technical is the Job, what is the skills 
potential combat exposure, or what are the skills civilian opportunities. Presented below 
is a hsting of groupings in the Concepts Analvsis Agency (CAA) bonus study [Ref. 7: 
p. 4-21]. 

e Direct combat 

* Combat operations 

e Communications electronic operations 

e Communications electronic maintenance 
e. Mechanical maintenance 

e Supply services transportation 

e Medical 

e Administration 

e [neinecr Construcuon 


e Intelligence 


Groupings such as these make intuitive sense. ÍIowever, analvsis supporting use of these 
groupings is lacking. The kev point is the goal of grouping 1s not only to reduce the 
number of regressions to be performed, but also to form groups with similar reenlistment 
behavior. Therefore, to improve the quality of the estimates of reenlistment response 
rates. this study develops techniques to idenufv groupings of soldiers with similar 
reenlistment probabihtues. 
2. Variables to be Considered 

The study of the effects of reenlistment bonuses is not a trivial problem. It 1s 
difficult to determine whv soldiers decide to stay or leave the service. There are many 
factors which inpact a soldier's reenlistment decision, as diverse as what the job oppor- 
tunities in his hometown are, to whether he 1s well adjusted within his organization. to 
what the congressional action 1s on pay raises for the next vear. The reenlistment deci- 
sion is based not only on the bonus offered, but upon many other factors, both quanti- 
fiable and unquantifiable. The tmpact of these other factors is seen in Figure 2, which 
is a scatterplot of quarterly reenlistment rates for ten different Zone A MOS's over four 


vears, as a function of the bonus level. Although there 1s a general increasing trend in 


the reenlisunent rate. manv other factors are working to produce the observed variance. 
Without the explanatory effect of other variables, it is difficult to determine the true ef- 


fects of reenlistment bonus. 


REENLISTMENT RATES AS A FUNCTION OF BONUS MULTIPLIER 
TEN MOS'S, OVER SEVEN YEARS 
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Figure 2. Yearly Reenlistment Rates for Ten MOS’s Over Seven Years 


Manv researchers fail to examine the full range of potential, quantifiable ex- 
planatorv variables available. For example, the 1982 CAA studv uses only threc ex- 
planatorv variables; the bonus level, unemployment, and the inflation rate [Ref. 7: p. 
4-10]. Onlv two studies, a study by Chow and Polich [Ref. 8: pp. 29-31] and a study by 
Hiller [Ref. 9: pp. 20-31] examine a full range of variables. 

This studv examines a full range of potential, quantifiable explanatory variables. 
First, a theoretical framework of the reenlistment decision making process 1s developed. 
This framework guides the selection of variables and the gathering of data. Exploratory 
data analvsis techniques are used to determine which of the variables are most appro- 
priate for inclusion in the regression equations. Cross-validation 1s used to lend credi- 


bility to this analysis. 


Special attention 1s paid to the eífects of variables that the Army manipulates 
to influence retention. Variables the Army manipulates in this manner are called force 
alignment variables. 

3. Summary of Research Questions 
In summary, the following are the primarv research questions of this study. 
e Which variables to include in the models? 
e Ilow do force alignment variables impact reenlistment? 


e How to group soldiers to reduce the number of regression models required, and 
ensure homogeneous groupings? 


e How to address MOS migration and extensions, along with reenlistment eligibility 
requirements without complicating the model? 


e  What-conidencatosplace nthe estimates. 


D. SCOPE OF THESIS 

Due to the stated purpose of this studv, research is limited to active duty lU, S2 uu 
enlisted soldiers, with between 2 and 14 vears-of-service. Within this framework. the 
emphasis 18 placed on Zone Á reenhistments, * as the large majority of the bonus recipi- 
GIS ae UI One 

Because of the extensive research conducted in this area, an attempt 1s made to draw 
on previous studies to put together a comprehensive study of estimating reenlistment 
behavior for the UL. S. Armv. However. because of the requirement to estimate cocmm 
cients for all Nf{OS‘s, individual MOS differences which warrant special attention are for 
the most part ignored. 

One final note. This study does not address the issue of quahty of the reenhsting 
soldier. Because the military reenlistment bonus model does not distinguish between 


soldiers, all soldiers qualified to reenlist are assumed to be of equal quality.8 


E. ORGANIZATION OF THESIS 
Chapter ll is a review of the literature relevant to the estimation of reenlStiem 


response rates. 


7 Zone A extends from 2-6 vears-of-service (YOS), Zone B from 6-10 YOS and Zone C from 
10-14 YOS. 


8 The experiment outlined in the introduction, (page 2) which treats rank as a separate di- 
mension, attempts to address the quality issue. However within the new cell (dimensioned by 
MOS, zone and rank). all soldiers are considered of equal quality and the same assumption is made 
Here 


Chapter III develops a theoretical framework for the reenhstment process, and the 
data base 1s structured using this framework. 

Chapter IV describes the solution technique. 

Chapter V shows, in detail, the solution of the Zone A problem. Chapter V also 
discusses the validation of the Zone A model and the precision of the model. Chapter 
VI gives the conclusions and recommendation for further study. 

The appendices contain various details of interest to the reader, including back- 
ground on the military reenlistment bonus model, details on how the study details with 
factors such as MOS migration and extensions and issues such as variable selection. data 


set cleaning, regression models and Statistical tests. 


F. STATISTICAL PACKAGES 
The statistical package used in this study is SAS, bv the SAS Institute. Graphics 


was done using a pre-release version of GRAPHSTAT by IBM. 


II. REVIEW OF THE LITERATURE 


A. GENERAL 

The purpose of this chapter Is to review the literature on te estimation 
reenlistment rates, with the purpose of providing motivation for the technigues of this 
study. The issue of reenlistment bonuses 1s well studied; this review addresses only a 


portion of the work done. 


B. ARMY STUDIES 

The 1982 Concepts Analysis Agency (CAA) study addresses both a method for op- 
timizing bonus payments, and estimates of reenlistment bonus response rates [Ref. 7: p. 
3-16]. The study calls these rates SRB effectiveness coefficients, and the coefficients they 
estimated in 1982 are still in use today by the Force Alignment Branch of the UL. S. Total 
Army Personnel Command. 

The CAA study uses 1976-1951 data and variables to measure the bonus level, the 
unemployment rate, and the inflation rate. Over 320 MOS’s are grouped into ten skill 
groups.’ and linear regression models are used to estimate the SRB effectiveness coeffi- 
cients.10 The study does not estimate reenlistment rates, instead it recommicnds us 
the current reenlistment rate as the forecast reenlistment rate. 

A second study of Army bonus response rates, by Higham [Ref. 10: pp. 9-13], uses 
linear regression and variables that measure the bonus level, vear, calender quarter, un- 
employment rate and inflation rate to estimate reenlistment rates. The study estimates 
reenlistment rates for twenty-four MOS’s with good bonus histories, and then describes 
techniques to extrapolate the results to the remaining 300 MOS's. 

Both of these studies use linear regressions: Appendix I explains why logistic re- 
gression is preferred over linear regression in studies such as these. Both studies also 
examine a limited number of explanatory variables. One of the goals of this study 1s to 
examine a large number of variables for inclusion in the model. Neither study presents 
cross-validation results for their models. This study uses cross-validation to ensure 


model fit. 


9 These skill groups are listed on page 6 


10 The SRB effectiveness coefficients are the percentage increase in the reenlistment rate due 
toa one step increase in the bonus multiplier. 


10 


Another study of reenlistment propensities has been done by economists of the 
Armv Research Institute for the Behavioral and Social Sciences [Ref. 4: pp. 229-232]. 
The studv uses bonus levels. a civilian militarv wage index, the unemplovment rate. the 
soldiers ATFOT scorell, race, family size and groups soldiers by career management 
Wis study rs mteresting im tWo respects.. First. It examines three choices in the 
reenlistment decision making process, and therefore applies multinomial logistic re- 
EI icon. lhe three choices are to reenlist. to extend, or to leave the service. Re- 
searchers are split over whether to treat the extension decision as a separate choice, or 
to treat it as a deferred reenlistment decision. Our study chooses to treat extensions as 
a deferred reenlistment decision. Appendix B gives further explanation and justification. 

A second interesting aspect of the study is the grouping of MOS’s into career man- 
agement íields.12 Manv MOS's do not have adequate enough bonus histories for re- 
gression models. Therefore. most studies group MOS’s, either into career management 
fields or into groupings with similar job characteristics. A goal of our study is to ex- 
amine an alternative grouping technique. m which soldiers are grouped according to 
their reenlistment probabilities, regardless of which MOS’s they are in. 

A final Army studa discussed here is by two econonusts at the United States Mih- 
tary Academy [Ref. 11: pp. 211-212]. This study points to the examination of demo- 
e variables. such as race, sex, and family size as thec method to form homogeneous 
grouping« of soldiers with similar reenlistment probabilities.. This method 1s followed in 


GO of this study. 


C. ACOL STUDIES 

Pies Navy has done extensive rescarch into the prediction of reenlistment response 
tes | he annualized cost of leaving model (ACOL) represents the current state of the 
art of its research [Ref. 12: pp. 2-5]. ACOL models the reenlistment decision making 
process by examining the present value of the soldier's militarv pav potential and his or 
Moe lan pay potential. It also examines the soldier's “taste for mulitary service”. The 
model has a great deal of potential: however, it does carry some difficult to validate as- 
sumptions, such as the time horizon over which a soldier makes a decision, his or her 
discount rate. what their civilian earnings potential 1s, and whether the soldier's percep- 


tions of his or her earning potential 1s close to realistic. 


11 AFOT is the Armed Forces Qualification Test 


12 Career management fields are an administrative grouping of MOS's used by personnel 
managers to administer personnel programs. 


One study that uses this ACOL methodology is a Marine Corps study by Cymrot 
[Ref. 2: pp. 24-25], Cyvmrot groups marines into twenty-two skill families, and uses the 
one vear difference between the military pav and civilian pay potential, along with vari- 
ables to measure the bonus level, the unemployment rate, and the current rank of the 
soldier. 

The ACOL model holds a great deal of potential for predicting reenlistment rates. 
However for reasons of scope and data availability, it is not fully incorporated into this 
study. Instead, variables that measure the first vear difference between civilian and 
military wages are included in this study, in a manner similar to the Cymrot study ap- 
proach. 

This brief review of the literature services to further motivate the research questiems 


introduced in Chapter I. Additional review of the literature appears in Chapter III. 





Hil. DATA BASE 


A. GENERAL 

One of the shortcomings of many previous reenlistment studies is that thev fail to 
consider a broad range of variables Which may explain reenlistment behavior. For ex- 
ample, the 1982 Concepts Analysis Agency study examines only three explanatory vari- 
ables: the bonus level, the inflation rate, and the unemployment rate [Ref. 7: p. 4-10]. 
One of the goals of this studv is to examine a full range of potential, quantifiable ex- 
planatory variables. 

The purpose of this chapter is to describe the selection of variables and the devel- 
opment of the data base. A conceptual framework is developed to give focus and di- 
rection to the data gathering effort. At this point, it 1s not important to assess the 
potential significance of anv particular variable. or to establish relationships between 
them: instead it is sufficient to create a list of promising variables. In Chapter V, ex- 
ploratorv data analvsis techniques determine which variables to include in the regression 
equations. Seven variables are included in the regression model. 

This chapter focuses primarily on the conceptual framework for the Zone A 
reenlistment decision. 

1. Source of Data 

Data for this project comes primarily from the Defense Manpower Data Center 
(DMDC). in Monterey, California. The nussion of this organization 1s to archive man- 
power data from all services for use in studies such as this. The Army gain loss fle is the 
primary source of data for the project. Other data includes economic variables from 
sources such as the Bureau of Labor Statistics. 

The data available from DMDC are records of soldiers actually making 
reenlistment decisions. Individual-level records are chosen for the analvsis rather than 
eroup-level data because the later provides onlv limited insight into which variables in- 
íluenee soldier retention. To study the determinants of reenlistment behavior, data on 
individuals themselves are most appropriate [Ref. 13: p. 3]. However. the analysis of 
individual-level data 1s not without its costs in computing time and data storage re- 


quirements. 


2. Response Variable 
The response variable for the study is binomial: either the soldier chooses to 
reenlist in his or her MOS or not. Some studies model the reenlistment decision-making 
process as a multinomial choice of reenlistment, extension, or leave the service. Ap- 
pendix B addresses the issue of why a binomial response variable 1s chosen over a 
multinomial response vaniable. 
3. Explanatory Variables 
This study includes a variable in the data base if it is quantifiable and if there 
is some indication (hypothesized or in previous literature) that this factor explains the 
reenlistment decision-making process.!3 The ideal variable 1s one that 1s also predictable 
in the future [Ref. 14: p. 20]. In those cases where a primarv variable 1s not quantifiable, 
the study develops surrogate variables. For example. it is difficult to quantify the suc- 
cess of a soldier. This study uses the rank the soldier achieves and the speed with which 
he achieves it as sumovates for Iilitany success 
4. Survey Data 
Survey data is not included in the data set. Unfortunately. this caminatesi the 
onlv wav to measure a considerable number of reenlistment factors. especially those 
concerning soldier attitudes towards their jobs. and living conditions. However the 
problems with survev data are twofold. First. it is umpossible to match survey data with 
the individual records. Second. although some past surveys are available, the survey 
effort falls considerably short of the scope of the individual data gathering effort. Survey 
data, and the studies that analyze it. assist in providing the insight necessary to choose 
variables for this study. However, survey data is not available to measure tio. vari- 
aolen 
5. Time Period Covered 
The data base covers the period from the fourth guarter, FYSO thru them 
quarter, FY$9, 34 quarters of data in all. Data obtained before 1980 are not included 
for practical reasons. Prior to that date, DMDC stored data in the gain loss file in a 
different format than is used at present. Conversion of that data 1s an expensive. time 


consuming process. Which 1s not justified for this project.14 


13 If a vanable explains the reenlistment decision-making process it means that it reduces the 
uncertainty of prediction of reenlistment rates. 


14 One advantage to including more data (pror to 1980) in the study is to improve the range 
of values of the explanatory vanables. However, analysis shows that all vanables have a good range 
of values, and only modest improvement 1s achievable by including values from 1974-1979. A 


6. Size of Data Set 
The data set contains the records of over 500,000 Zone A soldiers making their 
reenlistment decisions. The study breaks the data into two groups. one group of data 
for analvsis and development of the regression models. and the second group of data for 
Validation. Numerous previous studies have neglected the validation process; the latter 


step 1s a requirement for lending credibility to any predictive model. 


B. CONCEPTUAL FRAMEWORK 
We hypothesize that the reenlistment decision-making process of a soldier consid- 
ering reenlisting for the first time depends on the following four factors. 
e The soldiers imual motivation for mulitarv service. 
e The soldier’s success in the military and satisfaction with military life. 
e The soldier's evaluation of the potential for success outside the military. 


e ‘The mfluence of Army reenhtstment policies on the soldier’s mitial decision- to stay 
em lease, 
First some comments on the specifics of this framework. 
l. Initial Motivation for Military Service 
Previous research supports the hypothesis that initial enlistment motivation 1n- 
fluences a soldier's first term reenlistment behavior.!5 For example, an Air Force study 
of first-term reenhstment intentions of avionics technicians hsts career intentions at the 
time of enlistment as the most important factor contributing to the technician's 
reenhstment plans [Ref. 15: p. vii]. Of course the difficulty is measuring enhstment mo- 
Memon. Ile most direct wavy is to survey soldiers; however. historical survey data is 
not available. Instead. this study uses the following variables to gam insight into 
enlistment motivation. 
See tiny Colle 


gee fund Program Participation (ACF) 


e Enlistment Bonus 

e Enlistment Term 

e Enhstment Program Training Program 

* Age at Enhstment 
second reason not to include data prior to 1980 is relationships between explanatory vanables and 
dependent variable may change over time; emphasis 1s best placed on the more recent history. 


15 The terins Zone A and first term are interchangeable in this study. Both refer to soldiers 
making their first reenlistment decision, usually after two to four vears of service. 


e Age at Separation 

se Pdudation at Enlistment 

e Dependent Status at Enlistment 
@ Prior Service 

eT Nese ee Line 

e Youth Program 

e Hometown 


e Unemplovment Rate at Time of Enhstment 


The study uses these variables to determine whether a soldier is Job, training or 
education-motivated. While these variables do not directly measure a soldiers enlistment 
motivation, they give insight into 1t, which in turn helps predict the soldiers reenlistment 
propensity. 

Appendix C gives a detailed discussion of each of these variables. 

2. Success in the Service and Satisfaction with Military Life 

The soldier's motivation for entering the service determines his or her initial 
reenlistment propensity. However. the success the soldier achieves in the first term. and 
his or her satisfaction with military life, profoundly effects this initial reenlistment pro- 
pensity. As before. there are problems with directly measuring these factors. For ex- 
ample the nulitary uses items such as enhtsted evaluation reports, skill qualification tests, 
awards, and promotions rates to measure a soldiers success. Of these, only promotion 
rate imformation 1s available for use in this study. However, at least numerous studies 
support using promotion rates as a measure of success in thec military. In one study bv 
Ward [Ref. 16: p. v] promotion speed relative to that of peers is the only indicator of a 
high level of achievement. Two studies go further and trv to predict promotion rates 
using intelligence and educational scores. Although the results of these studies are not 
consistent nor parucularly strong, this study includes intelligence and educational vari- 
ables [Ref. 16: pp. 1-3] [Ref. 17: p. 14]. 

Measuring a soldier’s satisfaction with military hfe is also difficult. However 
numerous studies find that quality of hfe issues appear to have little effect on the first 
term reenlistment decision, although the impact of these factors increase dramatically in 
importance thereafter. For example, one study uses survey data to show that although 
military families do not like separations, they do not leave the service because of them 
[Ref. 18: p. 27]. Supporting this is a study which finds the effects of factors such as 


family separations are not significant in the first term reenlistment model [Ref. 8: p. 25]. 
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Two studies bv the Navv Personnel Research and Development Center find that quality 
of life issues are not statistically significant predictors of first term reenlistment intent 
[Ref. 1S: p. vi] [Ref. 19: p. vi]. One quality-of-life issue that has some significance is first 
term dutv location. One researcher finds that soldiers stationed overseas during their 
first-term have reenlistment rates higher then those stationed in the continental United 
Ses [Ref S: p. 23). 
As a result of the above arguments, this study includes the following variables. 

e Character-of-Service 

e Promotion Rates 

eee | OF Score 

e Mental Test Category 

SES Score 

e Education Level at Reenlistment 

e Change in Education 

e Years-of-Service 

e Current Rank 

e Duty Location 

e Dependent Status at Reenhstment 


e Change tn Dependent Status 


Appendix D discusses each of these variable in more depth and provides further 

motivation for including them in the analvsis. 
3. Evaluation of Potential in the Civilian Sector 

le ure develope a conceptual framework to explam the reenlistment 
decision-making process of soldiers. The framework starts bv looking at the soldier's 
initial enlistinent motivation. This motivation (whether it is job, training or education) 
gives the soldier an initial bias towards staving or leaving the service. The soldier's initial 
bias is changed based on the success the soldier achieves in the first enhstment term and 
his or her adjustment to military life. Many soldiers decide during the first term that the 
Army is not for them, and they leave the service. However, we hypothesize that many 
soldiers decide whether to stay or leave the service after making a comparison of their 
nulitarv and civilian potential. The purpose of this section is to discuss the variables 


associated with this comparison. 
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An issue is whether soldiers can make meaningful evaluations of their potential 
in the civilian sector. This studv assumes thev can. Secondarv issues are: how can the 
study measure the soldiers opportunities, and does the studv's evaluation of a soldier's 
potential match the soldier’s evaluation of lus or her potential? 

There are a nuinber of wavs to measure the civilian opportunities available to 
a soldier. One way is to look at the job category the soldier is in, and employment 
growth of comparable civilan jobs. Another is to look at the civilian military wage in- 
dex. These efforts are hampered due to incompatibilitv of numerous Armv skills with 
comparable civilian skills. Additionallv, national economic indicators such as gross na- 
tional product (GNP), consumer price index (CPI). and the unemployment rate to are 
used to assess the civilian opportunities available to the soldier. 

Finally. the studv uses demographic variables as surrogates for the civilian ver- 
sus military evaluation a soldier makes. Researchers note that women and black soldiers 
reenhist at higher rates than white male soldiers. The researchers hvpothesize that this 
is due to women and blacks seeing insufficient job opportunities in the civilian sector, 
as compared to nulitary career options. Additionally, researchers hypothesize that 
women and blacks see enhanced promotion opportunity in the military as compared to 
the cian sector ker O) 

The studv therefore uses the following variables to explain the soldier's evalu- 
ation of potential in the civilian sector: 

SR cc 

e Ethnic Group 

e SEN 

e Job Ivpe 

e Unemplovment Rate 

e Civilian Military Wage Index 
e Consumer Price Index 

e Gross National Product 


e Percentage Growth Civilian Jobs 


Appendix E describes each of the above variables in more depth and provides 


further motivation for including them in the study. 
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4. Reenlistment Policy Variables 
After soldiers compare opportunities in the civilian sector to those im the mili- 
tary, they make an initial reenlistment decision... However, the impact of Army 
reenlistment policies can change this decision. For example, a soldier who initially de- 
cides not to reenlist may change his mind in response to the offer of a reenlistment cash 
bonus. A soldier who initially wants to reenlist mav change her mind because she 1s 
unable to get the reenlistment option of the training or duty station she desires. Addi- 
tionally, changes in reenlistment eligibility may make the soldier ineligible to reenlist. 
The above are examples of the affects of reenlistment policy vanables. 
The Army 1s not able to directly manipulate all variables listed in this section. 
For example, military pay and the retirement programs are policies that the Army can 
only recommend to Congress. However, all the variables in this section are policy vari- 
ables at some level in the government. 
The study includes the following policy variable: 
e Retirement System 
e Number of Years to Military Retirement 
e Real Military Compensation (RMC) 
e RMC Adjusted by Inflation 
e Bonus Pavment 
e Type of Bonus Payment 
Seo skill Migration 
e Promotion Rate Forecast 
*. Reenlistment Eligibilitv Criteria 


e Reenhstment System 


Appendix F discusses each of these variables in more depth and the motivation 


for including each of them in the analvsis. 


C. SIGNIFICANCE OF UNQUANTIFIABLE VARIABLES 

Despite including over forty variables in this study, there are still numerous un- 
quantifiable factors which may explain the reenlistment decision-making process. Those 
related to satisfaction with military life "near to have little effect on the Zone- A deci- 
sion. Ilowever this study also exclude ,ob satisfaction variables, such as autonomy, 
physical work environment, skill utilization, team effort, and relationships with peers, 


subordinates and supervisors. This is unfortunate, because studies show job satisfaction 
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Is extremely important for the first term reenlistment model!6 [Ref. 20: p. 11]. Job sat- 
isfaction variables are excluded because they are not measurable, except by survey, and 
survey data is not available in sufficient detail to match the study's data set. Addi- 
tionally, job satisfaction variables are difficult to predict (forecast) and therefore do not 
fit well in the reenlistment model. 

What is the significance of omitting variables such as job satisfaction? More unex- 
plained variance may appear in the regression models, which leads to less precision and 
confidence in the reenlistment response rates. We discuss these issues in more depth 


later. 


D. CLEANING THE DATA SET 

Initial study indicates that the data set has a considerable amount of inaccurate 
data. For example. Figure 3 shows the variable TERM OF ENLISTMENT. For this 
variable, 6.1%0 of the entries are for zero or one vears. or lor more than feun xear Ju 
are invalid terms of enhstment.17 Analysis shows that invalid data rates range from 
0-15%0 for most variables; however, seven of the variables have error rates of 15-25%0,18 

Clearlv there 15 a need to investigate the source of the data errors. and deternns (a 
potential impact on the analysis. This investigation revealed that every entry for FYS] 
is In error for the seven variables with error rates of 15-25%. Discussions WINS 
determined that the data file used in this study was a merging of two other data files, and 
in the case of FYS1, this merging was incorrectly performed. While DMDC is correcting 
the problem for future use, the corrections were not available for use in tii: a AERE 
Therefore, EF YSI data were excluded koni further andik m 

DMDC referred us to the U. S. Total Army Personnel Command for an explanation 
of the error rate of up to 15% on the remaining variables. The information systems 
managers acknowledged that they had difficulty obtaiming accurate data from Army or- 
ganizations, and although thev said efforts are underway to improve the quality of the 
data. they offered few suggestions of how we could improve our data set. 

Rather than discard all records with invalid data, an attempt was made to clean the 


data set by cross referencing other data. An example is the vanable TERNINOL 


16 However, job satisfaction decreases in unportance in the second term. 


17 Inaccurate data are determined by consulting the appropnate Army Regulation for the ac- 
ceptable ranges of entnes. 


18 There is no missing data in the data set. 


ENLISTMENT Figure 3 shows the errors in this variable for a random sample of 
IS records. 
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Figure 3. Frequency Counts for the Variable Term of Enlistment, Uncleaned 


TERM OF ENLISTMENT values of zero and one vear are not valid, nor are values 
of greater than four years. The study corrects for this by examining enlistment dates and 
reenlistment dates and inferring from this the enlistment term. Following cleaning, the 
variable TERNI OF ENLISTMENT has the distribution of Figure 4. 
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Figure 4. Frequency Counts for the Variable Term of Enlistment, Cleaned 


Using procedures such as described above, much of the invalid data was corrected. 


Appendix G lists the amount remaining by variable. Error rates range from 0-7.8%, 


with numerous variables having less than 1% invalid data. As a part of the cleaning 
process.all remaining invalid data were recoded as missing data. 

The question is whether the amount of missing data listed in Appendix G are ac- 
ceptable. or if additional cleaning is necessarv. The SAS statistical procedures of this 
study exclude observations with missing values from further analysis [Ref. 21: p. 550]. 
Therefore, missing values are of concern if they constitute a high percentage of the ob- 
servations in the multidimensional analvsis, or if the missing values are not randomlv 
distributed throughout the observations.!9 However, our analvsis shows that the amount 
of remaining missing data is reasonable, and that the missing data does not change the 
results of our analysis. Appendix G show the results of the statistical procedures that 
show these results. Therefore, no further cleaning of the data set 1s done. Continuous 


variables are cleaned in a similar manner. 


19 An example of non-randomly distributed missing values is the seven incorrectly coded 
variables of 1981, discussed above. 
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IV. METHODOLOGY 


"GENERAL 
The purpose of this chapter is to motivate the new methodology for predicting 


reenlistment rates. 


B. MOTIVATION FOR THE METHODOLOGY 
l. Problems With Current Solution 
The purpose of this study is to predict reenlistment rates for each of the Army's 
350 nulitary occupation specialities (MOS). However, it 1s impractical to do a separate 
regression on each of the different MOS's for a number of reasons. These reasons were 
elsemssed 1) some detail in Chapter l. and are reviewed here. 


e anv of the 350 MOS’s (60-70%.) have never (or infrequently) been assigned a 
reenlistment bonus. Estimates of regression coefficients for those MOS's produce 
musleading results, because of the inadequate range of bonus values. 


e All soldiers in an MOS receive the same bonus level at the same time, and therefore 
it is difficult to separate the effects of the bonus level from other explanatory vari- 
ables. 


e Bonus levels have a verv high correlation from vear to year within an MOS, which 
degrades the accuracy of the regression results. 


e There ts evidence that MOS s do not represent homogenous groups of soldiers with 
simular probabihties of reenhsting. Therefore, considerable variance 1s added to the 
problem before the regression 1s conducted. 

Numerous previous studies have addressed these problems by grouping MOS’‘s 
together, usually forming 10-20 groups of 10-50 MOS’‘s. Grouping in this manner 1s 
usually done by combining MOS’s that have similar job characteristics. The Concepts 
Analvsis Agency study uses this approach [Ref. 7: p. 4-21]. 

Forming groupings of MOS’s in this manner solves the first three of the four 
problems listed above. There are, however, two criticisms of this technique of grouping 
MOS's. Tiirst. the groupings are formed on an intuitive basis. and no attempt is made 
to quantitatively determine if the grouping is sensible. Second, the fourth problem listed 
above (MOS's are not a homogeneous grouping of soldiers with similar probabilities of 
reenhsting) 1s not solved. Clearlv, If an MOS is not a grouping of soldiers with a similar 


probabilities of reenlisting, then neither 1s a grouping of MOS’s. 


A major theme of this thesis 1s analysis of a new technigue of grouping soldiers. 
The methodology looks for groupings of soldiers with similar probabilities of reenlisting, 
independent of their military occupation specialities. Since the groups contain soldiers 
of differing MOS's. they have robust bonus histories, and less correlation from vear to 
vear. Potentially, this grouping technique solves all four of the problems listed above. 

To more fully explain and motivate this solution, the assertion that an MOS is 
not a collection of soldiers with similar probabilities of reenlisting 1s now examined. 

2. Non-homogenous MOS 

Previous rescarch supports the assertion that an MOS 1s not a homogenous 
grouping of soldiers with similar probabilitics of reenlisting [Ref 3: p. 4]. This section 
provides examples to illustrate the point. 

First the fact that an MOS has subgroups of soldiers with widely varying 
reenlistment probabilities is demonstrated. As an example. Infantrivmen (MOS 11B) 
have a 347 reenlistment rate over the past six vears. llowever, when the MOS is par- 
utioned into two categories bv DEPENDENT STATUS (one category 1s single soldiers 
without dependents, and the second category 1s married and single soldiers with depen- 
dents)20 these two categories display widely varying reenlistment rates of up to 20°. 
Figure 5 shows the example for Infantrvmen (MOS 113). 

This result 1s not unique. Figure 6 shows threc other MOS's which also display 
the same characteristic. Additionally, Figure 6 shows that all MOS's taken together also 
displav about a 20%0 difference between the reenlistment rates for soldiers with and 
without dependents. Although the actual rates differ some by MOS (there are many 
different factors interacting in this stmple example) the general trend holds. 

There are other variables that have simular characteristics. For example, Figure 


7 shows Infantrymen (MfOS 11B) partitioned into categories by RACE. 


20 Dependents may be children, elderly parents or any other legal dependent 
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REENLISTMENT RATES BY DEPENDENT STATUS 
FOR INFANTRYMEN (MOS 11B) 
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Figure 5. Reenlistment Rates for MOS 11B. Zone A by Dependent Status 
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Figure 6. Reenlistment Rates for Differing MOS’s by Dependent Status 
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REENLISTMENT RATES BY RACE 
FOR INFANTRYMEN (MOS 11B) 


REENLISTMENT RATES 





Figure 7. Reenlistment Rates for MOS 11B. Zone A by Race 


mU ul the dillerent racial groups have differing reenlistment rates. by up to 15%o. 
Were are manv other examples. some of which are summarized in Table |. Percentages 


are Íor all MOS's taken together, and do not necessarilV include all categories. 


Table 1. REENLISTMENT RATES BY CATEGORY, FOUR VARIABLES 
y 
Cn OT ED USIDDGDO 1) 


Region O ountry ; 
| 
a nm 





From this simple example 1t 15 possible to see that an MOS 1s not a homogene- 
ous grouping of soldiers with respect to reenlistment propensity. There are categories 
of the MOS that display widely differing probabilities of reenlisting. These results are 
seen in most MOS’s analyzed. 

Once we establish that the MOS 1s not a homogeneous grouping of soldiers with 
similar reenlistment rates, We also want to show that different MOS’s are comprised of 
Varving percentages of soldiers from the different categones. To illustrate this, a simple 
example using Infantrymen (MOS 11B), Unit Supply Specialist (MOS 76Y). and 
Programmer'Analvst (MOS 74F), and the variable race 15 provide. 

Figure $ below gives the percentage of each race that comprise the given MOS. 
It is readily seen that the differing MOS's are not comprised of the same proportions of 
the racial groups. Again this is a general result found with many variables and most 
NU O 


RACIAL COMPOSITION OF MOS'S 
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Figure 8. Racial Composition of Three MOS's 


The results to this point arc as follows: 


* MOS's are comprised of categories of soldiers with diíferent probabilities of 
reenlisting. 


e Soldiers in a given category will display simular probabilities of reenlisting in many 
different MOS’s. 


e MOS's are comprised of different proportions of the categories. 


3. Example of Methodology 
Using these observations, we can predict reenlistment rates for MOS's using a procedure 
illustrated by the following trivial example. 
Over the past six years, the reenlistment rate for Infantrymen (MOS 11B) aver- 
aged 34°0:; for the Unit Supply Clerk (MOS 76Y) the rate averaged 46%. An explana- 
tion for this difference is that MOS 76Y is comprised of higher proportions of soldiers 


with higher probabilities of reenlistment. Table 2 provides the example. 


Table 2. REENLISTMENT RATES COMPARISONS 


m emalen ipe- Females reenlistat a rate 1929 higher 
male than males 


20°» Black | 35% Black | Blacks reenlist ata rate 14% higher than 
whites 


Mesendent | 32% De- 0o De Se se pemdents rcenlist at a 
Status cn pendents rate 20% higher 





Again, this trivial example explains the higher reenlistment rate of MOS 76Y bv dem- 
onstrating that it 1s comprised of higher proportions of soldiers who reenlist with higher 
probabilities. This example provides the motivation for our approach. 
4. Assumption of the Methodology 

A significant assumption 1s made at this point. The method of this study forms 
homogeneous groupings of soldiers by looking for similar probabilities of reenlisting. 
We assume that soldiers with similar probabilities of reenlisting will display sinular bo- 
nus response rates. Work by one researcher supports this assumption. He shows that 
soldiers exhibit similar bonus and pay response rates by demographic groups [Ref. 11: 


o). 


Motivation for Variable Reduction 


LD 


There are 40 explanatory variables available to explain the reenlistment decision 
making process of a soldiers. Jt is not practical to continue with a 40 dimensional 
problem. and therefore part of the methodology is to reduce the number of variables. 
The reasons why this is important are as follows: 


e Including 40 variables would require the prediction of those 40 variables each time 
the model 1s run. 


e Including 40 explanatory variables increases the chance for collinearity within the 
regression model, which reduces model performance. 


e Including 40 explanatory variables (over 20 of which are categorical variables) will 
require the estimation of over 100 coefficients. A regression equation of this size 
lacks the parsimony necessary of a good model. 


e Most of the explainable variance in reenlistment response rates can be explained 
with considerable fewer than 30 variables. 


Therefore. variable reduction will be an important part of the solution method. 


C. METHODOLOGY 
As a result of the above discussion. this study adopts the following solution steps. 
e Select influential categorical variables using log-linear models. 
e Partition the population into cells with similar reenlistment probabilities. 
e Reduce the number of cells using cluster analysis. 
e Select influential continuous variables using logistic regression. 
e Estimate reenlistment rates for each cell using logistic regression. 
¢ Compute projected reenlistment rates for each MOS as a linear combination across 
all cells. 

The use of log-linear models for the categorical variables. and the logistic models for 
the continuous variables 1s suggested since the study uses a binary response variable. 
Influential variables are defined as variables that are likely to be statistically significant 
predictors of reenlistment rates, and are identified through exploratory data analvsis us- 
ing log-linear and logistic models. The cluster analvsis addresses the issue of sparse cells. 
Cluster analvsis, log-linear models and logistic regression are all discussed in more detail 


in Chapter V. Appendix I and Appendix J. 
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V. ZONE A ANALYSIS AND RESULTS 


A. GENERAL 
The purpose of this chapter is to demonstrate the application of the methodology 


outlined in Chapter IV to the Zone A reenlistment problem. 


B. SELECTION OF INFLUENTIAL CATEGORICAL VARIABLES 

The first step 1s to select influential categorical variables, for use in partitioning the 
Zone A population into cells of soldiers who have sinular probabilities of reenlisting. 

There are thirty categorical variables available to partition the population, with 
some of the variables having ten to twenty categories. In the worst case. the problem 
IS partitioned into 8 x 10% cells. Clearly this is an unmanageable number of cells. 

The approach to reducing the number of variables is to use exploratory data analysis 
techniques. In addition to reducing the number of variables, opportunities to reduce the 
number of categories within a vartable are also explored. 

l. Exploratory Data Analysis of Categorical Variables. 

This study uses a systematic approach of exploratory data analysis on the 
categorical vanables. It can best be described as a bottom up method. The approach 
Starts by first understanding the data through the studv of the vartable’s distributions 
and simple univariate procedures, and then increases dimensionality with bivariate and 
multivariate techniques. This approach is advocated in the data analysis books such as 
Cne | Ref, 22: pp. 316-319]. 

One problem with this approach 1s that it 1s impractical to test a large percent- 
age of the interactions of groupings of three or more variables. For example, to test all 


interactions of three variables would require 
30 
T 4060 (2) 


a aee study uses an approach outhined ini reeman and Jekel [Ref 23: 


tJ 


different models. 
pp. 514-519] to discover interesting muluvariate groupings. Freeman and Jekel recog- 


nize that the variables of potential interest may be hidden in a forbiddinglv large cross- 


Simestitcation scheme and that there ts a tradeofi between trying to reduce the number 
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of variables and the potential of losing valuable information. Therefore, they propose 
the following procedure. 
e Perform a test for independence between each pair of variables. 


e If two variables are dependent, then form a compound variable using them. 
Compound variables are formed by combining two variables together into a single 
variable with categories corresponding to all combinations of categories of the 
variables being combined. 


e Perform a test for independence between these compound variables and all other 
variables. 


e Form new compound variables for each pair consisting of a compound variable and 
a single variable that are dependent. 


e Continue this process until cell frequencies becomes small (less than one.) At this 
point, terminate the selection process, and choose the variables with the most sig- 
nificant associations for inclusion in the reduced table.21 [Ref. 23: pp. 513-518] 

The goal of this section 1s to produce a parsimonious model [Ref. 24: p. 156]. 
For reasons of readability, we do not present every test conducted within the paper. 
Instead an example or two 1s presented to show the procedure, and than the results 
summarized. 

2. Exploratory Data Analysis Tools 

There are two primary type models to use on categorical data. Thev are linear 
models, as described by Grizzel. Starmer and Koch [Ref. 25: pp. 491-492] and log-linear 
models, as described by Bishop, Fienberg and Ilolland [Ref. 26: pp. 25-37]. 

This study wil primarily use the log-linear models for the study of categorical 
variables. Log-linear models work especially well in analyzing contingency tables of 
three or more dimensions [Ref. 27: p. 207] and are useful in testing hypotheses abowmae 
nature of relationships between two or more categorical variables {[Ref. 24: p. 143]. 
Appendix H gives the background of log-linear models. 

3. Distribution of Variables 

The first step in the systematic approach to data analysis 1s to study the dis- 
tributions of the individual variables. Table 3 lists the thirty categorical variables, and 
gives the range and type of measurement scale of the variable. The right column 1s ex- 


plained below. 


21 The procedure outlined does not guarantee selection of the best table, nor should it always 
be followed rigorously. Instead in the spirit of exploratory data analysis. it is a rational, easily im- 
plemented procedure to select an interesting table. 


Table 3. MEASUREMENT SCALE AND RANGES FOR CATEGORICAL 
VARIABLES 
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The most significant result of the study of individual distributions concerns the 
number of observations in each categorv. Variables are of two tvpes. One tvpe. of 
which the variables TERM OF ENLISTMENT and SEX are typical. have a large num- 
ber of observations in one category. Figure 9 shows the uneven frequency distribution 
of TERM OF ENLISTMENT and SEX. Table 3 has a Yes in the nght columnar 


variables of this type. 
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Figure 9. Frequency Counts For Selected MOS*s 


The second type variable, of which CIVILIAN OPPORTUNITY OF JOB 
SKILL and REGION OF COUNTRY ENLISTED FROM are typical. have the bulk 
of frequencies spread over manv values. Figure 9 shows the larger number of categories 
with a significant number of observations for the variables CIVILIAN OPPORTUNITY 
OF JOB SKILL and REGION OF COUNTRY ENLISTED FROM. These variables 
havea -\o ithe Mentcolumn of Tables: 

When the population is partitioned using variables that have a large number of 


observations in one category (and therefore other categories with extremely small num- 


ber of observations), this causes a large number of sparse cells. The issue of sparse cells 
is addressed in great length later in the study; however, it 1s important to understand the 
causes of those sparse cells. 
4. Univariate Analysis 
The first result of univariate analvsis concerns vanables having interval meas- 
urement scales. Figure 10 shows the reenlistment rates for the categorical variable AGE 
AT ENLISTMENT, an example of a variable with an interval measurement scale. 
Clearly the older soldiers are, the higher their probability of reenlisting. However, the 
variance increases significantly as age increases, due to the decreasing number of obser- 


vations. 


REENLISTMENT RATES 
AS A FUNCTION OF AGE AT ENLISTMENT 
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Figure 10. Reenlistment Rates for all MOS’s, by Age at Enlistment 


AGE AT ENLISTMENT is one of the interval variables that can be treated 
either as a categorical variable or as a continuous variable. Although it could recoded 
into fewer categories, it is not intuitive to do so, because of the generally increasing 


probability to reenlist as age increases. Additionally, because the bulk of the observa- 


ws 
th 


tions are in the left tail. numerous sparse cells result. Analvsis such as this leads us to 
drop the following variables from consideration as categorical variables. They will be 
reconsidered as continuous variables. 

e Age at Enlistment 

e Age at Separation 

e Years of Service 

e Number of Years to Militarv Retirement 


e Reenlistment Bonus Multiplier 


There are numerous variables in which hypothesized relationships are not vali- 

dated by the univariate analysis. Among these are: 

e Enlistment Bonus 

e Enlistment Program 

e Youth Program 

e Retener tem 

e Tvpe of Bonus Payment 

e Job Skill Migration 

e Reserve Time 


e Duty Location 


Some of these variables ares rejected due sig) data proble For example, 
ENLISTMENT BONUS has far fewer number of Soldiers coded’ as mr eccn un 
reenhstment bonus then are known to have received them. Some of the vanael< i 
dropped because there is no significant difference in the reenlistment probabilities for 
different categories. For example, ENLISTMENT PROGRAMI is dropped for this rea- 
son. Finally. some variables are discarded because of interactions with other factors: 
For example. DUTY LOCATION is discarded because analysis shows reenlistment rates 
of over 959 for soldiers stationed overseas. However, further analvsis shows that sol- 
diers who near the end their term of service overseas are brought back from overseas 
prior to their discharge. while reenlisting soldiers remain overseas. If not corrected for, 
this leads to a biased assessment of the effect of DUTY LOCATION on the reenlistment 
rarer 

The final univariate analysis result involves reduction in the number of catego- 
ries in certain variables. Figure 1] shows why MENTAL CATEGORIES are recoded 


from seven categories to four categories. Categories 2-5 have statistically similar 


reenlistment probabilities, and therefore are recoded into one category. 


REDUCTION IN CATEGORIES 





Figure 1]. Reenlistment Rates by Mental Category and by Rank 


Figure 1] shows how the variable CURRENT RANK is recoded as three 
groupings, even though there clearly appear to be four distinct groupings. However, 
when the frequency numbers are examined, the E6 category contains less than 200 of the 
15.788 observations. Since the E6 category is not statistically different from the ES 
category, thev are combined without loss of precision. 

Analvsis shows significant differences in reenlistment rates by home state. 
Clearly, however, including the fifty state categories 1s impossible. Since, there appear 
to be regional trends. the first step is to categorize the states into the nine standard 
United States regions. While categorization into these regions 1s a good first step. there 
are still some inconsistencies. and the number of categories is still too great. Therefore, 
tie States are further categorized into five regions. Figure 12 shows the reenlistment 


rates for those five regions. Analysis shows that these categories are stable over time. 


oe 


Similarly, the Army's 350 military Job specialities are grouped into three general cate- 
gories, which is our subjective evaluation of the civilian opportuniues available to sol- 


diers with different job skills. 


REENLISTMENT RATES 
FOR REGIONS OF THE COUNTRY 
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Figure 12. Reenlistment Rates for Regions of the Country 


At the end of the univariate analvsis, 17 variables remain. All have between two 
and five Calevome 
5S. Multivariate Analysis 
One of the purposes of the multivariate analysis is to choose between groups of variables 
that are clearly collinear. The first of these groups are the variables which measure ed- 
Uca e sels: 
e Education at Enlistment 
e Education at Reenhstment 


e Change in Education 


The second group measures dependent status. 


e Dependent Status at Enlistment 
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e Dependent Statue at Reenlistment 


e Change in Dependent Status 


The third group measures race and ethnic groups. 
e Race 


e Ethnic Group 


The analysis confirms the dependence between the variables, and gives guidance 


as to the best variables to select. The variables are: 
e Education at Reenlistment 
Dependent Statue at Reenlistment 


e Ethnie Group 


Eu ul or this anals s. l? categorical variables are retained. These 12 are 


listed in lable 4. along with their final categories. 


Table 4. REMAINING CATEGORICAL VARIABLES 
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6. Table Selection 
To further reduce the number of variables, the procedure (described on page 32) 


by Freeman and Jekel [Ref. 23: pp. 514-519] 15 applied to the remaining 1? variable. The 


first step in selecting the multi-dimensional table 1s to examine the dependence of all 


pairs of variables. The analysis of the dependence uses Cramer's test [Ref. 23: pp. 
514-519] as a measure of association. The signiíicant pairs oí variables are TD GR SR 
RH and JE. This first table is not displaved due to its size, however it is constructed 
similar to Table 5 below. 

The second step in selecting the multi-dimensional table 1s to form a compound 
variable from each dependent pair of variables as described on page 32. and then test the 
dependence of the compound variables with all remaining variables [Ref. 23: p. 517]. 


Table 5 shows the results. 


Table 5. ASSOCIATIONS WITH COMPOUND VARIABLES 
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Signiíicant tables are TDG, SRJ, JER, and HGR. Continuing on in this manner 
leads to the following results. 
7. Results of Exploratory Data Analysis 
As a result of the exploratory data analysis, the following variables are used to 


partition the data Set: 


Temi (2 categories) 
Rank (3 categories) 
Sex (2 categories) 
Race (3 categories) 
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Dependents (2 categories) 
Region (5 categories) 


Job ipe (3 categories) 


C. PARTITIONING OF THE POPULATION INTO HOMOGENEOUS CELLS 

The purpose of this step is to partition the population into homogeneous cells con- 
taining soldiers with simular probabilities of reenlisting. The variables are the influential 
categorical selected in the above step. 

Using the seven categorical variables with between two and five categories each to 
parution the population creates a total of 1080 cells. A random sample of 75,788 Zone 
A soldiers shows that 859 of the cells have non-zero frequencies, 162 over 100 observa- 
trons, and 12 over 1000 observations. 

Clearly, this rs too many cells. Additionally, the sparse cells (those approximately 
350 cells with under 25 observations) do not perform well in regression. Therefore. fur- 


ther reducuon of the number of cells must occur. 


D. CELL REDUCTION 
l. Cell Reduction Procedure 
There is considerable literature concernmg cell reduction of multidimensional 
Commneency tables. These studies identify three primary wavs to reduce multidimen- 
NG Oles [Kef. 28: p. 546] [Ref. 29: pp. 325-329]. These three methods are; 
“i cduce the Number of Variables 
e Reduce the Number of Categories in a Variable 


e Combine Cells Within the Multidimensional Contingency Table 


mi etiese three techniques, the first two are fully exploited m previous sections. 
Analvsis shows that further reduction using these techniques results mm significant loss 
Oimenmmation. Therefore, we turn to techniques to combine cells within the multidi- 
mensional table to further reduce the number of cells. 

Combining cells within the multidimensional table using cluster analysis 1s the 
technique used in a thesis by Larsen [Ref. 30: pp. 22-34]. The problem he solves is esti- 
mating retention rates for Marine Corps officers. He partitions his population into cells 
using vears of service, job specialitv. and source of comnussion. Similarly to this thesis, 
he ends up with many sparse cells, and combines them using cluster analvsis. 

While this study does not use the computerized cluster analvsis techniques of 


the Larsen study. the ad-hoc procedure used follows the same principles. The primary 
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reason for not using the computer package ts the existence of special structure in the 
problem, which is not fully exploited by the package. 

The special structure in this problem 1s the existence of a subset of variables 
which have a large percentage of the observations in one category, and therefore other 
categories with few observations. An example of this 1s the variable SEX, which has less 
then 8% woman. An extremely large proportion of the cells that have this category 
associated with it are sparse cells. 

The second part of the special structure 1s that the variables having the large 
percentage of the observations in one category also have the most significant differences 
in probabilities to reenhst between cells. For example, in the case of the variable SEX, 
the category WOMEN is a relatively homogeneous grouping. requiring little further 
categorization. The ad-hoc procedure of this study exploits this structure to combine 
cells by examining the variables in the following order: 

ee fer or cnlistinent 
SON 
e Rank 


e Dependents 


e Race 
e Region 
© Job i pe 


This ordering examines those variables with the largest percentage of large cat- 
egories first. 

2. Cell Reduction Results 

Using the ad-hoc cluster analysis procedure reduces the number of cells from 
1080 to 92. All cells have at least 37 observations (from a random sample of 75778 ob- 
servations). Only five of the cells have under 100 observations, and 24 of the cells have 
over 1000 observations. 

Although variable reduction is proceeding, there are still too manv cells. 
Therefore cells, are further combined, this time by grouping cells with” synn 
reenhstment probabihties. Cells are grouped only if thev fall into a three percentage 
point window. Attempts are made to group like cells; this goal 1s shghtly relaxed to fa- 
cilitate groupings. 

36 cells result from the second iteration of cell reduction. Reenlistment rates 


vary from 7% to 80% within these cells. “The smallesticelliñas 232 observations (Ton 


75778 observation sample, and 20 of the 36 have over 1000 observations. Appendix J 


lists the composition of each of the 36 cells. and the reenlistment rates for each group. 


E. SELECTION OF INFLUENTIAL CONTINUOUS VARIABLES 
l. Exploratory Data Analysis of Continuous Variables 

The purpose of this section is to select the influential continuous variables for 
inclusion in the regression equations. The technique ts exploratory data analysis, using 
a bottom up approach as described earlier in this chapter. The primary tool is logistic 
regression. Appendix I describes these technigues in detail. 

The section begins with 20 potential variables. The goal is to choose five to 
seven for inclusion in the regression equations. 

Since the reenlistment population is partitioned into 36 different cells, this 
analvsis could be preformed separately for each cell. However, this entails a prohibitive 
amount of work. Instead the exploratory data analvsis is performed on the entire pop- 
ulation. This 1s compensated for by the separate stepwise regression on each cell. 

A general observation of the exploratory data analvsis is that although there are 
significant relationships between many of the explanatory variables and the response 
variable, few of the variables account for a large portion of the vartanee in reenhstiment 
probabilities. This result lowers considerably the expectations for the amount of the 
ce the overall model explains. 

2. Distribution of Individual Variables 

The purpose of this section 1s to examine the distribution of the continuous 
variables. The logistic regression model requires no specific distributronal assumptions 
(for example normality). However, the regression model gives inaccurate estimates if the 
variables do not have sufficient range and spread. Table 6 shows the range, mean, and 
standard deviation for the continuous variables. All the variables have adequate range 
and spread. A second issue 1s the scale of the variables in relationship to each other. 
Regression techniques often do not perform well if the variables are widely scaled. The 
scales in this case are Moderate, and a well-behaved model is anticipated. 

3. Univariate Analysis 

The primary purpose of the univariate analvsis is to select the influential vari- 
ables for inclusion in the regression equations. 

Figure 13 gives the results of a logistic regression to test the significance of the 
variable BONUS LEVEL on the probabilitv of reenhsunge, usimg the SAS LOGIST 
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Table 6. RANGES. MEANS AND STANDARD DEVIATIONS FOR CONTIN- 
UOUS VARIABLES 
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procedure. Of note are two items. First is the low R value. Appendix I discusses the 
R value for logistic regression in detail; 1t 15 analogous to the R in ordinarv least square 
regression. Which is a measure of the fit of the model. The second item of note is the p 


'alue. This represents the following hypothesis test. 
Ho: Coefficient Estimate is Zero (3) 
H: Coefficient Estimate is Not Zero (4) 


The specific test 1s a Wald test for zero slope, and the test statistic is closely approxi- 
mated by a Chi-square distribution [Re 31: p. 191]; The low 2 value in Fisurc ll i 
presents a low probability that the variable BONUS has a slope of zero, and therefore 
a low p (< 0.05) represents the rejecuon of the null hypothesis, and strongly suggests 


that the bonus does have a effect on reenlistment rates. 


+ 


LOGISTIC REGRESSION PROCEDURE 


DEPENDENT VARIABLE: RCODE 


73481 OBSERVATIONS 
u5697 LEAVE = 0 
27784 REUP = 1 
O OBSERVATIONS DELETED DUE TO MISSING VALUES 


VARIABLE MEAN MINIMUM MAXIMUM SA 
BONUS 0.485935 0 s Das Lo 
CONVERCENCE IN 15 ITERATIONS R= 0.060. 

VARIABLE EE UD PERO) CAL SQUARE P R 
meee RECEPT -0.576 O SO 4349.01 0001 

BONUS 0158 0.0084 354,48 08001. 0.060 


Figure 13. Regression of Bonus Level vs Reenlistment Probability 


The above example has an estimation of the intercept term of -0.576 and a slope 
of 0.158 for the variable BONUS LEVEL. These, however. are the transformed inter- 
cepts (see Appendix Í for a full explanation). To get the actual reenlistment probability 
at a given bonus level Eguation 5 1s used, where « and f are the intercept and slope 


terms, and A is the bonus level. 


l 
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A plot of this function is in Figure 14, 


REENLISTMENT RATE 
AS A FUNCTION OF BONUS MULTIPLIER 
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Figure 14. Plot of Bonus Level vs Reenlistment Probability 


A second purpose of the univariate analvsis is to “fine tune” the variables. An 
example of this 1s to plot the unemployment rate just prior to a soldiers reenlistment 
date, and also lagged by two months, then six months and nine months, and see which 
is most influential on the reenhstment probability. The issue is much more complicated 
than this however, because there are issues of which unemployment rates to choose (for 
the entire population or for certain age groups), whether to choose local regional or 
nauonal rates, and whether to choose unadjusted or seasonally adjusted rates. Clearly 
this level of detail is bevond the scope of this thesis: whole studies have addressed just 
the one issue of which unemployment rate to use. Some limited work 1s done on the 


continuous variables; however, for the most part we have relied on the literature to point 


46 


the way in choosing continuous variables. The limited results achieved mathis analysis 
are incorporated in Chapter III. 
4. Bivariate and Multivariate Analysis 

One major issue of this analvsis is collinearity. When variables included in the 
regression are collinear or linear combinations of each other. they reduce the precision 
of the coefficient estimates. There is significant potential for collinearity in the esti- 
mation of reenlistment rates. The reason is that longer soldiers remain in the service, the 
higher their probability of reenlistment becomes. Therefore, anv variable that increases 
as a function of a soldiers tine in the service shows a positive correlation with the 
reenlistment probabilitv. Examples of these variables are many. Rank increases with a 
soldier's increasing time in service, and pav amount 1s a function of rank and time in the 
service. Generallv the number of dependents a soldier has increases with service. as does 
his education level. and his age. A soldier’s mitial term of service rs positively correlated 
with his time in service. These are all examples of potentially collinear variables, which 
Meee crsely alfect the precision of the coefficient estimates. Therefore. extreme care 
is taken to ensure that variables that are collinear are not included. 

To test for collinearity. regressions are performed on pairs of potentially 
collinear vaniables. Ifthe variables display a high R value. then thev are highly collinear., 
ne Ol the variables 1s not included in the regression model. For example. the two 
variables, AGE AT ENLISTMENT and AGE AT SEPARATION are potentially 
collinear. A regression of these variables has an R value of 0.9229. This meh R value 
st clue of the collinearity of these variables. If collinear variables are included, 
the regression model will indicate a better model fit than 1s justified by the data. A full 
explanation of collinearity, and its effects on regression models rs found in Mosteller and 
Jukev [Ref. 32: pp. 280-284]. 


Results of Exploratory Data Analysis 


J | 


As a result of the exploratory data analysis of the continuous variables, the 


study includes the following variables in the regression models: 
e Unemployment Rate at Reenlistment 
e Promoton Rate 
aT OT Score 
e Pay 
bonus Level 


e Reenlistment Sistem 
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F 


Age at Entrv 


ESTIMATION OF REENLISTMENT RATES 


A stepwise logistic regression 1s performed on each of the 36 cells, using the proce- 


dures outlined in Appendix I. Appendix K contains a table of results. The table con- 


tains the estimated coefficients, plus the R value for each regression. Additionallv 


Appendix K gives the results of the hypothesis test to see if the coefficient 1s statistically 


different from zero. 


G. 


Equation 6 below gives a example of the bonus equations for one of the cells, Cell 


Pym AAA AA SA (6) 
i 1+ OS — 0.209 x Bonus + 0.012 x AFOT + 0.057 x Age al Entry 


Analvsis of the results in Appendix K leads to the following observations: 


The R values for all the regression equations are low. This was expected. as the 
estimation of reenlistment rates 1s a diflicult problem. This is because many factors 
play into a soldiers decision to reenlist; We can only hope to capture some of those 
reasons With measurable variables. 


Although the A values are small, the explanatory variables included have low p 
values, indicating that the slope of the estimated coefficient 1s significantly different 
tian zero 


There are some cells for which the bonus level did not significantly influence the 
reenlistment rate. 


COMPUTATION OF MOS REENLISTMENT RATES 


The fina] step to the procedure is to calculate the reenlistment rate for the NCO NER 


a linear combination across all the cells. To illustrate how this is done. an examp ikai 


provided. 


In this example, the reenlistment rates for MOS 11B (Infantryman) are compured 


for 1990. The following information 1s estimated for next vear. 


The unemployment rate will be 5.0%. 


MOS 11B’s promotion rate average will be higher than other MOS's, so that the 
average 11B soldier 1s promoted six months sooner than the average. 


ne OW averace score wile O. 
The pay raise for next vear will be 3.2% 
The reenlistment svstem will remain liberal 


Additionally, the average 11B soldier eligible to reenlist next year was 19 vears old 
Wen he cn seed: 
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Figure 13 gives the projected breakdown, by cell, of MOS 11B for soldiers eligible to 
reenlist next vear. Computing the reenlistment rate for MOS 11B gives the results in 
Table 7, 


Table 7. REENLISTMENT RATES FOR MOS 11B 
Bonus Level Reenlistment Probabilitv 


Tm 
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H. MODEL VALIDATION 

Since the data set was partitioned prior to the beginning of the analvsis. cross- 
validation of the regression models is possible using the remaming data. 

mc cross-validation is conducted on the 36. rather than on the 350 MOS's. Table 
8 shows the results of a randomlv selected number of the cells. The first column shows 
DE Matted reenlistment rates for the cell over the past six vears.. The second column 
le acuual reenlistment rates. The excellent fit of the model is seen just by compar- 
ing these two columns. The fit is confirmed through use of a chi-square goodness-of-fit 
fee ive procedure followed is the same as described in Appendix J. The model is re- 
atte ¢ = 9.05 level. if the test statistic is greater than 3.841. Clearly, these result 
confirm the validity of the regression models. 

A second part of the model validation is to check the residuals of the regression 
model. There are no indications of problems with the residuals. Appendix | discusses 


the form of the logistic regression residuals. 


I. MODEL PRECISION 

The military reenlistment bonus model is a determmistic model which optinuzes es- 
timated means. and requires point estimates of reenlistment rates. However, we feel 
obligated to discuss confidence intervals on those point estimates. We recommend the 
that the users of the militarv reenlistment bonus model conduct sensitivity analvsis, by 


v 


Varving reenlistnient rates in order to understand how the estimate impacts on their de- 
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cisions. The confidence intervals provide guidance on the reenlistment rate values that 


should be used for worst and best case estimates. 


Table 8. RESULTS OF MODEL VALIDATION 


Estimated Actual 
Reenlistment Rate }| Reenlistment Rate 
30.5% +08% 1027 


61.4% 





The nulitary reenlistment bonus model does not accept confidence intervals as model 
inputs. Therefore. instead of generating a table of 380 MOS confidence intervals that 
would not be used, we instead provide a general rule of thumb to guide the selection of 
values for sensitivity analvsis. NU sae the predicted rate + - 10% gives a 709 con- 
fidence interval, the predicted rate +,- 1523 gives a 95% confidence Witerval ee 
worst case estimates also attempt to account for additional error that results from inac- 
curacies in estimating the imputs to the reenlistment model, such as the unemplovment 
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Figure 15. Breakdown of MOS 11B by Cell 


“TI... CONCEU SIONS 


A. FINDINGS 

This study develops a methodology for estimating reenlistment rates for use in the 
military reenlistment bonus model. It departs significantly from methods of previous 
studies in that it does not group MOS's into skill families or other similar groupings. 
Instead this study looks for homogeneous groupings of soldiers with similar probabilities 
of reenlisting, and develops regression models for these groupings. 

There is strong statistical evidence that certain groups of soldiers have very different 
reenlistment propensities. These groupings are best defined by categorical] variables, 
which partition the population into cells of soldiers who are homogencous with respect 
to their reenlistment probability. This study assumes that these groups are also homo- 
geneous with respect to their response to changes in bonus levels. There is some prior 
research to support this assumption [Ref. 11: p. 212). 

Manv researchers include one or two categorical variables in their regression 
equations. Tew. however, exploit the full potential of these variables. Including more 
categorical variables leads to many cells with low expected frequencies. 

To overcome the low expected frequencies, this study first partitions the population 
into cells and then groups cells. The grouping procedure uses the principles of cluster 
analvsis to take advantage of special problem structure by finding the variables most 
likely to create low expected frequency cells. The resulting grouped cells contain: ccia 
with nearly the same statistical reenlistment probabilities. Regression models are devel- 
oped for each grouping of cells. and MOS reenlistment rates as a function of bonusieve 
are calculated as a linear combination across the cells. 

Most of the regression equations had low R? values. These low R? do not invalidate 
the model for several reasons. First, the grouping of the cells by clustering is a 
variance-reduction step. The R? for the regression models indicate the amount of varl- 
ance within the groups that is explained. Since the grouping of cells reduces the ram amag 
within a cell, the potential for further reduction 1s limited. Second, while the R? is low, 
the variables included in the regression models are statistically significant. Third, the 
study 1s hampered by the quality of the national economic variables. Variables such as 
GNP, UNEMPLOYMENT RATE and CIVILIAN JOB GROWTH are quantified at 


an aggregated level. Finer resolution data (by quarter and by geographic location) 
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help further explain variamce. Fourth. the low R? value is not unexpected in this 
type of problem. This study tries to explain a soldier's reenlistment propensity using 
nationallv measurable variables. However survews of soldiers show that the reenlistment 
decision making process 1s complex, involving issues as complex (and unmeasurable) as 
a soldiers relationship with his peers, and his job satisfaction. Given this. it is not sur- 
fom tat the KR? 1s low. Finaliv, despite the low K?, the models are validated using 
cross-validation. This cross-validation finds the models to be'a highly predictive. credi- 
ble models of sienificant value. 

A noteworthy finding of this study is that the variable BONUS LEVEL 1s not sig- 
nificant in numerous cells. In other words, soldiers in these cells do not respond to 1n- 
creasing cash bonuses. Obviously bonuses should not be allocated to MOS’s with high 
fercewiaces of soldiers from these cells. 

One of the difficulties of this study is the inability to quantitatively measure items 
such as the effectiveness of the reenhstment system in providing soldiers with their de- 
sired reenlistment option. Flower, the, results ol the subjective variable 
IF DISTMENT SWSTENI arc extremelv interesting. This variable measures how 
“hberal” the reenhstnent svstem is in providing soldiers their reenlistment options. It is 
significant m as manv equations as is the bonus level. The most recent improvement in 
this area 1s a program called the Commanders Override, im which the computerized 
maent system is manuallv overriden to keep a soldier in the service by providing 
lis or her reenlistment option choice. Clearly programs such as these are an alternatives 
to the cash reenlistment bonus. 

Another finding is the significance of the variables to measure a soldier's motivation 
to join the service. These enlistment variables are nmportant in determining the first term 
reenlistment model. Among these variables are TERM OF ENLISTMENT. SEX, 
RACE. REGION, JOB TYPE and AFQT PERCENT. Since many of the enlistment 
variables are significant in the Zone A_ reenlistment model, further study of other 
enlistment variables is in order. There is an enlistment data base which was not available 
for this study that contains numerous variables of potential mterest. Simce enhstment 
demographics appear significant to the first-term reenlistment decision, then one way to 
improve first-term reenhstments ts to target for enlistment those groups of soldiers who 
display the highest reenhstment propensities. 

A finding of this study is that the potentially complicating issues of MOS mi- 


grations. extensions and reenlistment windows can be ignored, with onlv minor loss of 
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accuracy in the reenlistment estimates. This greatly simplifies the reenlistment model. 


Appendix B discusses this issue in detail. 


This study developed a alternative technigue to previous methods of grouping 


MOS's. This method was cross-validated with data not used in the model development 


of the model. The results are highly predictive of reenlistment rates, and responses to 


bonuses. 


B. RECOMMENDATIONS 


The estimates of Zone A reenlistment rates developed in this study should be 


adopted for use in the military reenlistment bonus model. 


The procedures outlined in this study should be replicated to estimate the Zone B 


and Zone C reenlistment rates. 


C. RECOMMENDATIONS FOR FURTHER STUDY 


This study does not analvze the composition of the grouped cells to anv great ex- 
tent. However, one could potentiaiiwe sain considerable insight IM 
reenlistment decision making process from exploring the composition of each cell, 
and explaining why certain groups of soldiers cluster together. Similarly, detailed 
examination of the cells in which the bonus level 1s significant should be conducted 
in order to understand what types of soldiers respond to bonuses and why. 


Further attempts need to be made to quantify and study the force alignment vari- 
ables (such as pay. promotion rates and the form of the reenlistment svstem) which 
impact on the reenlistment program. These variables are potenually as powerful 
as the reenlistment cash bonus. 


The enlistment data base from the Military Entrance Processing Command should 
be examuned for further enlistment variables to explain the first term reenhstment 
decision. This data base was not available for this study. Several enlistment vari- 
ables were significant in this study s model, however. there are manv es 
enlistment variables sull to examine. Examples of variables that should be exam- 
ined include variables that measure a the income of a soldier's parents and the 
military background of the soldiers parents and siblings. 


This study used a type of cluster analysis procedure to reduce the number of cells. 
However, numerous other techniques are available for use. Many of the techniques 
are discussed in a thesis by Misiewicz [Ref. 33: pp. 1-15]. Further research should 
examine these additional procedures, particularly shrinkage using Empirical Bayes. 


The annuahzed cost of leaving (ACOL) model described in Chapter II, together 
with more detailed economic variables should be incorporated into this methodol- 
ogy. 


Finally, as an alternate solution technique, the use of intervention analysis should 
be explored. An article by Box and Tiao should serve as a starting point. [Ref. 
34: p. 70]. 
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SEeENDIN A. THE MILITARY REENLISTMENT BONUS MODEL 


A. GENERAL 

The mulitarv reenlistment bonus model 1s a mathematical programming model for 
opunuzing the allocation of reenlistment cash bonuses in order to achieve the desired 
force structure. The model 1s essentially a deterministic model. The model was devel- 
oped at the Naval Postgraduate School by Major Dean DeWolf. Major Jim Stevens, and 
olessor Kevin Wood. and is currently used by the L. S. Marine Corps and the U. S, 
Army [Ref. 1: pp. 1-3]. 


B. INPUTS 
The inputs for the model are by military occupation specialitv (MOS). Thev include: 
e Current force structure 
MW ieu force structure 
e Xumber of soldiers eligible to reenlist 
e Training costs 


e Projected reenlistment rates at each bonus level22 


Additionally. inputs include the bonus budget. and the maximum size bonus a soldier 1s 


a RE tO receive. 


OUT PUT 
(put from the model is recommended bonus levels for each of the 350 MOS's in 
each of their three zones. The model also outputs the projected force structure after the 


bonus pavments. 


D. OBJECTIVE FUNCTION 
The objective function measures the deviation from the desired force structure. 
Deviations in some MOS's are weighted higher because of the MOS’s criticality, or be- 


cause of the higher investment in training the Army has in certain soldiers. 


E. SOLUTION METHODOLOGY 
The model is formulated as a linear integer program. and is solved using Lagrangian 


relaxation. The solution on a main frame computer averages under ten seconds. 


22 Determining the projected reenlistment rate at each bonus level is the purpose of this study. 
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FSNIODEI USE 

Because of the short run time, and the case of input and interpretation of results, 
this model 1s extremely valuable to an analvst Who must compare numerous alternative 
solutions. and perform sensitivity analysis of input variables. Although not specifically 


designed for use by budget analyst. the model can also be useful in budget development. 
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APPENDIX B. CALCULATION OF REENLISTMENT RATES 


A. GENERAL 
The purpose of the appendix 1s to explain how this study deals with four potentiallv 
complicating issues in the calculation of reenlistment rates. These issues are: 
e MOS Migration 
e Extensions 
e Reenlistment Eligibility 


e Early Reenlistments 


How the study addresses these four issues has a profound impact on the calculation 
of the reenlistment rate. Therefore we start simply by defining how to calculate a 


reenlistment rate. 


— 


| Number Soldiers Reenlistine in MOS, | 
A NS ee a ame (7) 


a Number of Soldiers Eligible 


Each of the complicating factors potentially impacts on this rate calculation. The sim- 


plifving assumptions to prevent this are presented here. 


B. MOS MIGRATION 

NMOS migration 1s When soldiers in an overstrength MOS reenlists into another 
understrength MOS. MOS migration is encouraged at the reenlistment point as a way 
to align the Army's force structure. The issue is how to count migrating soldiers in the 
calculation of reenlistinent rates. 

MOS migration effects the numerator of the reenlistment equation. There are four 
different ways to count migrating soldiers. 

*. Count in the numerator only soldiers in MOS, who reenlist in MOS,. 


e Count in the numerator only soldiers from 1/05, who reenlist in MOS, and those 
from all other 1/05, ¿+7 who reenlist for MOS, 


e Make the reenlisunent decision a multinomial choice, to either reenlist for J/OS,, 
reenlist for any J/OS, i+, or not reenlist. 


e Count in the numerator soldiers in OS, who reenlist in anv J/OS_ including /. 


By process of elimination, the study chooses the first method of calculation. The 


second method 1s rejected because there ts no practical wav to predict how many soldiers 


of other MOS's will choose to reenlist in J7OS, The third choice. the multimi | 
choice. is rejected due to a technical aspect of the multinomial logit model. This solution 
technique works well only in cases in which there are three distinct choices. “Mer ce 
of the choices (to reenlist in OS, and to reenlist in ./OS,) are so similar as to render 
the technique ineffective (Ref. 35; p. 362]. The fourth option is rejected because 
not reflect the number of soldiers who remain in a MOS, which 1s vital information for 
the military reenlistment bonus model. Therefore the first option is selected. The benefit 
is this option keeps the model simple, and although there is some potential to underes- 


tumate the actual numbers of soldiers reenlisting for MOS, it 15 the best option. 


C. ENTENSIONS 

Some researchers, such as Goldberg and Warner. treat extensions as a separate de- 
cision. They use a multinomial model of three choices (extend, reenlist. and Ie mosina 
service) [Ref. 36: p. 17]. This study rejects this approach. and instead chooses to treat 
extensions as a deferred reenlistment decision. Therefore. only a soldier SN 
reenlistment decision is counts in the reenlistment rate calculation. This will case bias 
in the rate calculation onlv if soldiers extend in great numbers and for long periods. 
llowever, less than one in seven soldiers extend. and their primarv reason for extending 
is to become reenlistment eligible. This method of treating extension 1s supported by the 
research by Cymrot. is conclusion is that the effects of extensions are small, (less than 
10) and he recommends that the inputs to the reenlistment models do not have to be 
modified to account for extensions [Ref. 37: pp. 44-46]. Therefore, extensions are ig- 
nored, at onlv a small cost to the accuracy of the model. and at a large benefit to the 


model simphieity. 


D. REENLISTMENT ELIGIBILITY 

This study counts all soldiers who reach their end of term of service (ETS) as eligible 
to reenlist. This is not the normal interpretation, as many soldiers are declared ineligible 
to reenlist as they do not meet the Armv’s minimum reenlistment standards. However, 
the difficulty with this approach ts the data in the gain lose file designating reenlistment 
eligible soldiers is widely regarded as unrehable [Ref. 5: p. 26]. Any reenlistment rate 
based on this data is also unreliable. 

Therefore the best approach is to declare all soldiers who reach ETS as eligible to 
reenlist. Since reenlistment ehgibilty standard have remained relatively unchanged over 


the past ten vears, this is not an unreasonable approach. The estimation of the number 


of soldiers ineligible to reenlist than becomes a transparent part of the reenlistment rate 


Computation. 


E. EARLY REENLISTMENTS 

umentl,, soldiers are permitted to reemlist up to eight months prior to their ETS 
date.23 This issue complicates the reenlistment rate calculation by changing the numer- 
ator of the reenlistment equation. 

In his studv Cymrot shows that there 1s no simple way to account for early 
reenlistments effect on the reenlistment rate, and that the forecast error of reenlistment 
rates 1s about 2°o due to it [Ref. 38: p. 26]. This study recommends that soldiers are only 
counted as eligible to reenlist on one date, arbitrarv set at six months prior to their ETS 
data.24 This again greatly simplifies the model. although it cause the potential for some 
bias in the estimation. The bias is in the case of rising bonus levels, whcn soldiers who 
have previously decided not to reenlist change their minds due to a new, higher bonus 


In the case of falling bonus levels, thcre is no bias. 


23 Through FYS7, first term soldiers were allowed to reenlist six months prior to the end of 
their service term, and all other soldiers were permitted to reenlist three months prior. Since FY 
88. all soldiers are permitted to reenlist eight months prior to the end of their service term. 


¿3 50% of soldiers reenlist eight to six months prior to their ETS, and 3599 of soldiers reenlist 
six to three months prior, that the six month date 1s not unrealistic. 
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APPENDIX C. VARIABLES TO MEASURE INITIAL MOTIVATION FOR 


MILITARY SERVICE 


The purpose of this appendix 1s to more fully explain a soldiers initial motivation for 


military service. This is part of the conceptual framework of the military decision- 


making process introduced tn Chapter III. 


The data for these variables comes from the Army gain loss file. except for the un- 


employment rate information Which 1s from the Bureau of Labor Statistics. 


ACF 


ENLISTMENT BONUS 


ENLISTMENT TERM 


PROGRAM 


Army College Fund (ACF) In a very interesting seme 
of the Navy enlisted force. one researcher fund 
educational programs reward military personnel leav- 
mg the service bv providing what is im effect a negative 
reenlistment bonus, in the form of edueational benefits 
that can onlv be used by a full time civilian student 
[Ref. 39: p. 2]. It is hypothesized here that am iene 
motivated for military service by college moner i aian 
kelo niente Sc 


Studies show that soldiers receiving a reenlistment bo- 
nus at their first reenlistment pomt are less likely to 
reenhist once they reach their second reenlistment point 
[Ref. 40: p. 701]. Is there a simular eflect for soldier 
receiving enlistment bonuses? If enlistment bonuses 
bring people mto the service who otlrerwise do not en- 
list. then these soldiers may show a lower propensity 
to reenìist then other soldiers... The Armv also 
enlistment bonuses to induce people to enlist in less 
popular job skills. These soldiers may be more likely 
to migrate to a new job skill at the end of "tiem 
enh mienite 


One theory is that a longer enlistunent term mav indi- 
cate a stronger initial career intent on the part of the 
soldier. This is nutigated, however. because a soldier 
must enlist for four vears to earn an enhstment bonus. 
and soldiers receiving enlistment bonuses may have 
less carecer imici 


Enlistment Program Enlistment Program. This vari- 
able shows which enlistment or training program the 
soldier reenlists for. The purpose ts to determine 
whether a soldier is job, training or education orten- 
tated. Studies show that soldiers im these different 
groups have different propensities to reenlist and also 
response differently to outside factors such as the state 
of the national economy [Ref 30: p. 701]. The 
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AGE Al ENLISTMENT 


AGE AT SEPARATION 


EDUCATION 


DEPENDENTS 


PRIOR SERVICE 


RESERVE TIME 


YOUTH PROGRAM 


HOMETOWN 


UNEMPLOYMENT RATE 


enlistment program and training the soldier selects 
gives insight into the soldiers mitral orientation. 


Is there a correlation between age at enlistment. and 
enlistment motivation? One study by the RAND cor- 
poration shows a strong correlation between age at 
enlistment and first term attrition23 [Ref. 41: p. vu. 
lt is hypothesized here that age at enlistment 1s also a 
predictor of enlistment intent. 


Because soldiers enlist for different terms. age at sepa- 
ration 1s not exactly correluted to age at enlistment. 
Older soldiers are expected to reenlist at higher rates 
then vounger ones. 


Education at enlistment. Initially. only a variable for 
education at reenlistment was included in this study 
(see Appendix D for discussion of the variable Educa- 
tion) However, education at enlistment can poten- 
tially explain a soldiers motivation for entering the 
service. Therefore, it is included here also. 


Dependents at enlistment. Sinular to education, a sol- 
diers dependent status at enlistment 1s included as a 
variable m this study. 


Ilas the soldier with prior military service followed bv 
a break in service explored both the eivihan and muili- 
tary opportunities available, and now indicated with 
his or her choice a strong career intention? 


Eee ISI aos Seine Im tle i eserves Or 
National Guardeandetiien decides to come Onsactive 
duty more career oriented then the average soldier? 


Participation m military youth programs such as high 
school ROTC mav indicate that this individual. like 
reserve and prior service soldiers, has made compar- 
sons of both civilian and militarv options available 
from a perspective not available to the average person. 


Location, along with the economic conditions at that 
location are strongly related to enlistment propensity 
according to one study [Ref. 42: p. 230]. Hometown 
information is converted to regional information for 
use in this variable. The regions are further combined, 
so that five large regions are formed. States im each 
region have soldiers with simular reenlistment rates. 


The unemployment rate is examined as an indicator of 
an individuals motivation to enter the military. Two 
different unemployment rates are used here. One 1s the 


25 Soldiers under the age of 18 show significantly higher first term attrition rates then older 


soldiers. 
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average state unemplovment rate for the 13 months 
prior to the soldier enlisting. The other 1s the national 
rate for the same period. The justification for using 
these rates comes from a study on the sensitivity of 
first term Navy reenlistments to changes in unemplov- 
ment and relative wages [Ref. 40: p. 698]. Unemploy- 
ment data comes from the Bureau of Labor Statistics 
[Ref. 43: p. 8]. 


APPENDIX D. VARIABLES TO MEASURE THE SOLDIERS SUCCESS 


IN THE SERVICE 


The purpose of this appendix is to further describe variables which measure a sol- 


diers success in the service, and his or her satisfaction with military life. This is part of 


the conceptual framework of the reenlistment decision-making process introduced in 


Chapter III. All data comes from the Army gain loss file except where noted. 


CHARACTER OF SERVICE At each reenlistment point, the soldier receives a char- 


PROMOTION RATES 


AFQT SCORE 


acter of service. This is a gross indicator of previous 
performance. because if the character of service 1s an- 
vthing less than honorable, the soldier 1s not permitted 
to reenlist. 


Poen au “a Seler compared to: them peers 
within their nulitarvy occupation specialities appears to 
bethe best Way (eie ure a soldiers success within 
(he militate Solder MM ted es alu me a eno Cores 
and skill qualification test scores also look pronusing, 
but data 1s not avallable. The use of promotion rates 
astantndicator of Success in the military is Well sup- 
ponele sica a IA DO study [Rel T6 p. 
v]. The method of calculating promotion rates is the 
same used bv Warner in his masters thesis [Ref. 17: p. 
oS 


Armed Forces Oualificanon Test Tro studiit Sone Dy 
MIA Door ron and one ON an SPS student 
use intelligence and education scores to predict pro- 
me On I aes 48 fs plus the followine tree var- 
ables (mental test category. GT score. and education 
level) are measures of intelligence and education. al- 
though each comes with serious and well documented 
shortcomings as a measurement tool. Additionally. the 
results of studies Which use these variables as predic- 
tors are not particularly strong [Ref. 16: p. 3] [Ref. 17: 
p. 120]. Despite its shortcomings. the Army makes 
Treg usc OL TN Ae Sure oL mn elligence. 


MENTAL TEST CATEGORY This variable 1s also one of those used to predict pro- 


GT TEST SCORE 


motion rates. Mental test category 1s a discrete Version 
of the AFQT, ranging from 1 (highest) to 5 (lowest). 
Each category is further broken into sub-categories. 
The mental test category 1s hampered by the same in- 
consistencies described for the AFQT. 


General-Technical Test Score on the Armed Forces 
Vocational Aptitude Battery. Another of the variables 
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used to predict promotion rates. The Army uses this 
test score data to measure trainability. 


EDUCATION LEVEL The final variable used to predict promotion rates. 
The problem with the measure ol educau en aaa 
available in the data base 1s that it does not distinguish 
between soldiers who arc high school graduates and 
those who earn a high school equivalency credential 
(GED).26 


CHANGE IN EDUCATION Since the study examines education at enlistment. and 
education at the reenlistment point, it also examunes 
whether soldier who have improve their education level 
during their enlistment term have different enlistment 
probabilities then those who do not. 


YEARS-OF-SERVICE An Army Research Institute researcher discusses the 
use of tenure in the service as predictor of organiza- 
tional commitment and reenlistment propensity [Ref 
44: pp. 5-6]. He measures tenure with four Co moi 
vears-of-service, status, ranK and increasing responsi- 
bilitv. Data is available on vears-of-service and rank. 


CURRENT RANK A second measure of tenure. 


DUTY LOCATION This study uses duty Jocation as a quahty of life vari- 
able. A study of first term reenlistment decisions finds 
that Army enlistees who are stationed overseas ii kom 
higher reenhstment rate, and those stationed in the 
northeast United States have a lower reenHstmnene 
then average [Ref. 8: p. 23]. The duty station ae 
verted into regional or overseas location. 


DEPENDENT STATUS Researchers note that quality of hfe issues are rela- 
tively insignificant for the first term soldier IRE mE 
pp. 11-13]. The reason mav be that nmianv [7SH 
soldiers do not vet have families, whilc later term sol- 
diers do. Soldiers with families. or who support de- 
pendents should reenhst at higher rates then single 
soldiers do. This thesis defines a soldier as having de- 
pendents if he has anv legal dependents, whether they 
are childtens parentcmearmOtnen relit es. 


CHANGE IN DEPENDENTS Does a soldiers who starts his or her family while in the 
mulitary display different reenhistment propensity then 
single soldiers, or those who entered with families? 
This variable addresses the issue. 


26 Education level data which distinguishes between GED graduates and high school diploma 
graduates is only available from 1985 on. 
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APPENDIX E. VARIABLES TO MEASURE A SOLDIERS POTENTIAL 


IN THE CIVILIAN SECTOR 


The purpose of this appendix is to more fully explain a soldiers evaluation of his or 


her potential in the civilian sector. This is part of the conceptual framework of the 


reenlistment decision making process introdueed in Chapter III. The data 1s this group 


comes from the appropriate government agency, and from the Army gain loss file. 


RACE 


PEGAS IE GROUP 


SEN 


JOB IYPE 


The studv includes race and sex as surrogates variables 
to describe a soldier’s evaluation of his or her potenual 
iit evel Vilictiesectoneuerses tie military, . Rescarchers 
find higher reenlisument rates among black soldiers 
than white soldiers. The rescarchers hypothesis this is 
due to several factors, such as insufficient Job oppor- 
tunmiues for blacks in the civilian sector as compared to 
military career options, and enhanced promotion op- 
portunities in the militar [Ref. 14: pp. 29-30]. There- 
fore. faceme becomes an imdicator of  diflering 
opportunities available to soldiers in civilian sector and 
the nulitarv. 


For similar reasons as for race, a soldiers ethnic group 
1s included as a vanable. 


Studies also note higher reenlistment rates among 
“cc Du Ui Vor er soldiers Ree 14: 1. 
29). Again, researchers hypothesis this represents more 
opportunities for women in the military then they find 
mne ca eC UO He Il pp. 29-50]; 


The purpose ol this variable 1s to attempt to capture 
different civilian opportunities for differing job catego- 
res, 9 \lG@ctmesearencrs agree that soldiers with high 
tech” training have greater civilian opportunities than 
do ete sOn en 2D. SIR cL. « po 253]. . This 
variable also captures the expected lower bonus re- 
sponse rates for jobs that are risky or dangerous [Ref. 
d: p. 231]. The Armv’s administrative grouping of job 
skills into categories called carecr management fields 
(CMF), which we do not use because CMPF's often 
group occupations with httle in common [Ref. 5: p. 
4j.28 This study uses instead modified groupings from 


27 Women have a higher attrition rate then men during the first term. However if they com- 
plete the first term, women reenlist at a higher rate then men. 


28 For example. CMIF’s group job skills as diverse as a cannon crewman and a Pershing mis- 
sile. electronics specialist into the same category. 
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the Department of Defense Occupation Conversion 
Manual [Ref. 45: pp. 9-17]. 


UNEMPLOYMENT RATE Numerous studies find unemployment rates positively 
correlated with retention rates. and that wnempler 
ment rates reflect civilian employment opportunities 
[Ref. 6: p. 16]. Additionally, the unemplovment rate, 
(along with GNP and CPI) indicate the health of the 
national economv [Ref. 2: p. 34]. A studv for the 
S. Navy titled “The Sensitivity of First Terms 
Reenlistment to Changes in Unemployment and Rela- 
tive Wages” addresses the wide range of issues dealing 
with which unemployment rates to use29 [Ref. S02 p. 
54]. This study uses two, the state unemployment rate 
for the 13 months prior to the soldiers enlistment (dis- 
cussed in Appendix C), and the national unemploy- 
ment rate for the three quarters prior to the Solem: 
making his reenlistment decision. Lnemplovment data 
comes from the Bureau of Labor Statistics [Ref. 43: p. 
S]. 


C/M WAGE INDEX Civilian Military Wage Index. Surprisingly. studies do 
not find civilian military pay indexes to be explanatory 
of the reenlistment decision making process. Only one 
Navy study finds them to be significant predictors of 
reenlistments [Ref. 36: p. 32]. Numerous others 
this not to be true [Ref 14: p. 111] [Ref. 40: p. 707] [Ref. 
8: pp. 33-36] [Ref. 9: pp. 30-43]. The difficulty here ts 
trying to measure the civilian earning potential of sol- 
diers.. One approach ts to use veterans earnings as a 
Wav to estimate the earning potential of soldicrs imme 
civilian sector. However this introduces seleciona 
to thc data, because veterans who choose to lw 
the service do so because they expect higher civilian 
earnings than those who stay. Therefore anv cstiniare 
of civilan wage potential based on veterans earnings 
is upwards biased [Ref. 11: p. 203] [Ref. 46: p. v]. An- 
other difficulty with measuring civilian pay opportu- 
nities of soldiers is matching military skills with skills 
found mn the civilian sector. Despite the above Slime 
comings, this study includes the civiltan military wage 
index as a variable. The source of data 1s the Bureau 
of Labor Statistics [Ref. 43: pp. 115-177]. 


CPI Consumer Price Index. Like unemplovment and gross 
national product, CPI is a general measure of the state 
of the national economy, and therefore employment 


29 The issues break down into whether to use national. regional, or local unemployment rates; 
whether to use the rates for all workers or those for the 17-24 age group; and how much should the 
effects of unemployment be led or lagged. 
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GNP 


CIVILIAN JOB GROWTH 


opportunity. he source ol data js the Labor Statistics 
[Ref. 47: pp. 13-16]. 


Gross National Product. GNP also indicates the 
health of the national economy, and therefore indicates 
the civilian employment prospects ol muhitarv person- 
nel. None of the studies reviewed for this paper in- 
clude GNP as a variable. although GNP is the most 
frequently used measure of the state of the national 
econom GNP data is from U. S. Department of 
Commerce [Rei 48: p. 3]. 


This study hypothesizes that the percentage growth in 
civihan jobs 1s a more accurate indicator of actual em- 
plovment opportunities than is the unemployment 
rate. Data come from the Bureau of Labor Statistics 
[Re a si: 
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APPENDIX F. REENLISTMENT POLICY VARIABLES 


The purpose of this appendix 1s to more fully explain the reenlistment policv vari- 


ables in this study. The variables are part of the conceptual framework of the 


reenlistment decision making process of Chapter II]. Data in this section comes from 


the Army gain loss file except where noted. 


RETIREMENT SYSTEM 


YEARS TO RETIREMENT 


RMC 


ADJUSTED RMC 


BONUS PAYMENTS 


The purpose of this variable 15 to account for changes 
in thec retirement system made four vears ago. Soldiers 
enlisting before this date received benefits masias 
old retirement system. The new retirement svstem 1s 
less generous then the old one |Ref. 14: pp. 29 0 


One of the strongest predictors of reenlistment behav- 
1or 1s the number of vears to retirement. IIowever, this 
variable is most useful mn predicting Zone B and Zone 
C reenlistment rates. The vears to retirement IS 
tle influence on Zone Á soldiers, with the major impact 
not felt until the se enth vear [RS o ae 


Real Military Compensation. RMC 18 a measure of 
compensation that accounts for the faet that not all 
of a soldiers income is mm the form of direct pav. RMC 
accounts for the housing and substance allowances 
that soldiers receive either in cash or in kind (iii 
form of government hous:ng). RMC also counts as 
income the tax advantage a soldier gets because hous- 
ing and substance payments are not taxable. Due to 
the fact that the military compensation system 1s sufft- 
ciently complex. there rs considerable evidence (TN 
soldiers svstemetically and = significantly undervalue 
their compensation [Ref. 41: p. viJ. Changes in pav 
rates, rather than actual pay rates where used pie 
study. 


This variable takes into account how pay (and other 
forms of military compensation) keep pace with in- 
flation. 


The bonus payment level is the policy variable Army 
policy makers can most easily mamipulate. Since bo- 
nuses are paid to soldiers in job skills with low re- 
tention rates, normallv the presence of a bonus 
indicates that the job skill is in high crvilian demand 
or is an unpopular or demanding job. Bonus payment 
data comes from the Foree Alignment branch of the 
L. S. Army Total Armv Personnel Command. 


W 


TYPE BONUS PAYMENT 


SKILL MIGRATION 


PROMOTION FORECAST 


ELIGIBILITY 


REENLISTMENT SYSTEM 


method sof computing =the amount of -a 
reenlistment cash bonus has not changed since 1974. 
l lowever. the method of payment has changed. From 
April 1979 to January 1982, the cash bonus was paid 
to the soldier in a lump sum on the dav of reenlisument. 
llowever, in 1982 the method changed from a lump 
sum to a one-half lump sum payment, with the re- 
mainder of the bonus paid in vearlv installments. 
Studies show that the full lump sum payment induces 
more soldiers to reenlist then the alternate payment 
system [Ref. 6: p. 6] The data base includes records of 
soldiers under both payment systems. Bonus type data 
comes from the Force Alignment branch of the U. S. 
Army Total Army Personnel Command. 


The Army permits selected soldiers to change job skills 
at the reenhstment point. The force alignment needs 
of the dictate the number of soldiers who change job 
skills. The Army offers soldiers in overstrength MOS's 
the opportunity to change to understrength MOS's. 
These soldiers normally do not receive a bonus, how- 
ever their reward for changing MOS's 1s increased 
promotion opportunity in the new MOS. This variable 
mc aes eme tie sselaict. iS) il) All OveErsiren sth 
MOS and eligible to reenlist. Migration opportunity 
data comes from the force Alignment branch of the 
See! Cal ie ens onnel Command, 


An earlier variable looks at the promotion rate of a 


soldier respect to his peers. This variable looks at the 
promotion rate as a foree alignment variable which the 


Army manipulates. Promotion forecasts come from 


the Force wWicninentmerme ot the US Army Total 


Armv Personnel Command. 


Senni nea change OveL time.) lhe 
data base contains a variable coding reenlistment el- 
igibilitv. however this designation 1s highly suspect 
men S-3p. 26), Sve are not able iio dependent de- 
termine from the data records whether a soldier is eli- 
gible to reenlist. as reenlistment cligibilitvy depends 
partially on discipline and performance records not 
laa om sd cer us. vana ble 
measures which set of reenlistment eligibility criteria is 
in clicct atic times ie Sorter Geenlists, 


The purpose of this variable is to attempt to quanufy 
how liberal the reenlistment system 15 in giving a sol- 
dier his or her reenlistment choice of training or duty 
assignment. This study subjectively assigned values to 
this variable, based on interviews with the reenlistment 
managers at mens. Total Arm Personnel Com- 
manda he ceneral lecimo 1 that trony P ys. throush 
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FYS3, thec reenlistment svstem was modecratelv re- 
sponsive to soldicr’s needs. From FYS4 through 
FYS87, the reenlistment system was less responsive to 
soldier's needs, and during FYSS and FYS9 1t has been 
more highly responsive to soldier's needs. This assess- 
ment is duc to changes in the reenlistment system that 
occurred on 1 October 1953, and in. 1 Apmiggses 


APPENDIX G. MISSING DATA 


A. PURPOSE 
The purpose of this appendix is to show the amount of missing data present in the 
data set after cleaning, and to demonstrate why no further cleaning of the data set is 


required. 


B. MISSING DATA AFTER CLEANING 
Table 9 contains a hsting of the 30 categorical variable, and the amount of missing 
data present after cleaning. The amount of remaining missing data ranges from 0-7.8%o, 


With 23 variables missing less than 1%. 


C. RANDOM MISSING DATA 

To determine if further cleaning of the data 1s necessary, the data set 1s examined to 
see if the observations with missing data are a random sample of the data set. If they 
are, then elinunating the observations with missing data will not change the results of 
the analvsis, and additional cleaning will not be needed. 

First, the number of observations with at least one nussing value is calculated. using 


the ten variables from Table 9 with the most missing data. The results are in Figure 16. 


DATA CUMULATIVE CUMULATIVE 
MISSING FREQUENCY PERCENT FREQUENCY PERENNE 


69570 
6208 





Figure 16. Number of Observations With Missing Values 


As can be seen. only 8.2%. of all observations have one or more missing values. This 
amount is acceptable, provided the observauons with missing values arc a randomly 
distributed throughout the data set. To determine this, we test the hypothesis that the 
reenlistment rate for the those with nussing data is the same as the reenlistment rate for 


those without missing data. Figure 17 gives the reenlistment rates. 


a 


Table 9. MISSING DATA FOR CATEGORICAL VARIABLES 


Variable Name Percentage 


of Data 
Missing 


IS CSC ODDI 0.00% a 


Education at Enlistment 0.04? 
0.019 





Education at Reenlistment 





Change in Education Oa 


) f 


Cy 
~] 
Aa 
a, 
5 


J. 
2) 


Dependent Status at Enlistment 
0.055 
TO 0 


OS 


1.24% 


Dependents at Reenlistment 
Change in Dependents 


Character of Service 





Mental Test Category 


O 


| Ethmc Group 0.0] 
Lo I ae 0.02% 


Reurement System 0.00 


PH 


Number of Years to Military Retirement 0.009 


Tvpe o FBonussa ment O.00* 


Job Skill Migration 6.4920 
Reenlistment Bonus 0.009 





RESPONSE PROBABILITIES 


DATA RESPONSE NUMBER 


MISSING NO REENLIST REENLIST 


0.617824 Dm3 32176 
0.619845 sl 5 





Figure 17. Reenlistment Rates for Observations With Missing Data 


Obviously, the reenlistment rate for those observations missing data 1s verv close to 


that for those not missing data. To show this formally, we test the hypothesis: 
=> (S) 
Hi P, Se i (9) 


Where P, is the probabilitv of reenlisting of an observations without missing data, and 
P, ìs the probabilitv of reenlisting of an observations with missing data. The test statistic 


Is: 


MORO 5 OOM 
e i. ha o] 


= 10 
n mC C a 


where .\ is the total number of observations, Mi. t Ci, Cz are the row and column totals 
E Oa O. O- 


MAL ucal region is to reject 17, at œ = 0.05 if 7 exceeds A? 


l—a* 


a DM Mn CIC 
tenke e quante 
of a chi-square random variable with | degree of freedom [Ref. 27: pp. 145-146]. Since 
fe CO is much less than A? a = 7.879, we do not reject the null hypothesis. The 
level of significance of the test is greater then a = 0,25 

Therefore. since the missing values appear to be randomly distributed throughout 


the data set, further cleaning of the data is not required. 
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APPENDIX H. LOG-LINEAR MODELS 


The purpose of this appendix is to explain the use of log-linear models in the study 
of categorical data sets. The log-lmear model is analogous to the familar analysis of 
variance (ANOVA) techniques, except that log-hnear models are for dichotomous re- 
sponse variables, where the ANOVA 1s for continuous response variables. Both are for 
use with categorical explanatory variables. 

The standard log-linear model 1s Equation 11. where p. pP. p, are the probabilities 


associated with the different variables. 


Rate = App Py (11) 


Taking the natural logarithm of this equation vields Equation 12. 


Rate = In.it Inp,+ Inp,+ In py, (12) 


The SAS statistical procedure CA TMOD uses a maximum hkelihood estimate solved 
bv a iterauve proportional fittrmg procedure to vield estimators that are the best 
asymptotic normal estimators [Ref. 49: p. 345]. The properties of iterative method of 
proportional fitting of the log-linear model are summarized from Bishop [Ref. 26: p. 83]. 

9 Tt alwaxs conserecs to the redui ENOR 

e <A stopping rule is available to ensure the desired accuracy is obtained. 

e Startmg values mav be set for the estimates. 

The SAS categorical modeling procedure performs hypothesis tests to determine if 


the estimated parameters are significantly different from zero. The test statistic is a 


Wald statistic, Which is approximated by a clu square distribution [Ref. 49: p. 33]. 


APPENDIX I. LOGISTIC REGRESSION 


The purpose of this appendix is to describe the regression techniques used in this 
thesis. 

The key issue in selecting the regression techniques is the dichotomous response 
variable. Soldicrs make only one of two mutually exclusive reenlistment decisions, ether 
ee enst or leave the service. 30 

Since the response Variable is binary, the desired result of the regression equation is 


the probability of success (reenlistment) of a given soldier. 


Ss 
| 
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ere, = (0. 1). 
To apply a ordinary least squares regression to this, the following interpretation 1s 


made. The general form of the hnear regression model is: 


}; = Po Sr fì, A; PE (14) 
Meee ie probability that }, = 1. then: 
ELV = Pp = Po t+ BLN; (15) 


SERERE 0. I his is the linear probability model [Ref. 50: p. 12] [Ref. 35: p. 756]. 
There are a number of reasons why using ordinary least squares regression is not 
adequate for models having categorical response variables. 


e By delinition. the probability P, 1n Equation 13 must take on values between O and 
low ever, nsmettle Imear reeresstón model. the P can fal] outside the U. | range. 
Tigure 1S shows this where the solid hne represents an actual probability function, 
and the dashed line represents a linear approximation to it. In this example. 
the linear approximation goes outside the 0, | range for admissible f, + f,¥, [Ref. 
p. d). 


30 Some researchers study a multinomial reenlistment choice, however for reasons described 
in Appendix B. this study uses a dichotomous response variable. 


~J 
A 


LINEAR APPROXIMATION 
IN DASHED LINE 





Figure 18. Linear Approximation to a Probability Function 


e Linear regression uses the assumption of constant variance of errors, E[e?] = o?. 
However, the variance of the error term for a binary variable, where caclimgls ame 
vation 1s assumed to be a Bernoulli trial, with probability of success 7 1s: 


i ar[&i) — (Syr By Ap) C1 — Fo — Br AD (16) 


Since the variance of the errors depends on the observation. the ¢, do not liam 
constant variance. Use of ordinary least square regression models produces inefh- 
cient estimates and imprecise predictions [Ref. 35: pp. 419-422]. 


e The assumption that the }, are normally distributed is not valid with binary data. 
This is obvious. as the }, are either Ô or 1. Smce thev are not nonnaMlv disu tu as 
no estimation that 1s linear in F, is efficient [Ref. 35: pp. 419-422]. 


e The usual tests of significance for the estimated coefficient do not apply when using 
ordinary least squares on observations with Dinary response variables; estimated 
standard errors are not constant, and R? does not have its usual interpretation [Ref. 
35: pp. 419-422]. 

The solution to the above problems are transformations. The two most widelv used 


transformation are the probit and the logit transformations. The probit transformation, 


Which is based on the normal CDF is: 
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a 
pe | “Pa (17) 


The logit transformation, which is used because ofits close approximation to the normal 
CDF is: 


P= —— (18) 


Both of these transformations work well when there are sufficient repeated observations 
ole (when the explanatory variables are categorical). If, however, there are few 
repeated observations (continuous explanatory variables) then a maximum likelihood 


estimation of the logit model 1s used.31 The data for the model is shown m Figure 19. 


DATA 


NUMBER OF NUMBER OF EXPLANATORY 
CO TALS IN DOC CESSES NIN VARIABLES 
OBSERVATION I OBSERVATION I 





Figure 19. Data Format for Logistic Regression 


In this case the explanatory variables are continuous. and there is only one trial per 


Opservation (.\/, = 1) and S, ts either 1 or O (success of fatlure). [Ref. 35: pp. 419-422] 


31 While the logit transformation is somewhat arbitrary, ıt is selected because it is simple, 
tractable and well behaved even when the normality of L, 1s violated. 


vei 


The following discussion of the development of logistic regression 1s summarized 
from Judge [Ref. 35: pp. 425-436] and Nerlove [Ref. 51: pp. 14-22]. Using the binomial 
distribution, the probabilitv of a success in observation i is deíined as: 

/ 


M 
P{N, = s} = | ) Pi (1 — PN (IL) 
S 


where A — | ands = | 


The logit transformation 1s: 


] 
a (20) 
]+esÉ 
where: 
en =e (21) 
i 
The maximum likehhood function 1s: 
M; ç Ez, 
” ( ID (22) 


Following the procedures for computing a maximum likehhood estimator in Larsen [Ref. 
52: p. 262]. First take the natural log of the likelihood function. and substitute themes. 


pressions for ?, and 1— P. 


f 


k 
3 | | 
nL = Y hn o) —Sitn(1 + ed) + (34, $) EAR (1 + ely (23) 
i=! 


The next step is to take the derivative and set equal to zero, however this 1s not possible 
as the derivative 1s non-linear in the estimators. Instead, a Newton-Raphson method 1s 
used to find a numeric solution to the problem using an iterative procedure. The initial 


conditions are: 
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Xp = In (24) 
Jl; ap Ss 
The first step of the iteration is to compute the weights: 
0 
Ver! a a 
SS vp > 2 (25) 
(1 m ) 
Uj = WX, (26) 
p 
Mi, as H; / x 6? y 0 a= 
lH; = E al = N Mie o 2 yb; (24) 


j=! 


The nest step is to perform a least square regression of dependent variables Y, and the 


dm 
e 


Ea Med dependent variables Liese ln 
E E (28) 


Next, the estimates f° are updated. 


pr = p (29) 
i 
D (30) 
j—l 


The procedure 1s continued until the estimates converge. 


Using this procedure, the probabilitv of success with a given set of explanatory 


] 
Md (ôl) 
| >u 


The above discussion is summarized from Judge [Ref. 35: pp. 425-436] and Nerlove [Ref. 


variables iS: 


5l: pp. 14-22]. 
The statistical package of this studv 1s the LOGIST procedure of the SAS staustical 


package [Ref. 53: pp. 181-202]. The procedure uses the maximum-likelihood estimates 
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described above. Some specifics on the assumptions of the procedures, and the test 


Sarita: 


The assumption of the binary model ts that the probability that Y, — 1 is given by 
Equation 31. 


The response variable can be nonunally scaled. 


The Logit model has few assumptions, and 1s robust to the assumptions of ordinary 
least Squares regression. 


The logit transformation can be applied to a multivariate setting. This ts justified, 
because the marginal distributions of the multivariate logit transformations are 
themselves logit transformations. 


The SAS LOGIST procedure examines two wavy interactions between variables. but 
higher order interactions are assumed to be zero. 


The form of the residuals 1s undetermined. however the transformed residuals 
should be approximately normally distributed. 


Test of hypotheses and confidence mtervals in the SAS LOGISI procedum A ge 
constructed from estimates of the asymptotic covariance matrix using Wald statis- 
tics. These rely on the asvmptotic nature of the maximum Itkehhood estimator. 
The confidence intervals could also be determined using a bootstrapping (resampl- 
ing) procedure developed by Efron. [Ref. 54: pp. 5-18]. 


The R statistic is similar to the multiple correlation cocflicient in the normal setting 
after a correction 1s made to penalize for the number of estimated parameters. 


The SAS LOGIST procedure has a forward stepwise regression option. which is 
used mm this study. Where a least squares stepwise regression uses a f statistic for 
variable selection, the SAS LOGIST procedure uses a Rao's efficiency score sta- 
tiste. Simular to least squares regression. care must be taken in using the stepwise 
SAS LOGISI procedure. If arbitrarily apphed without proper safeguard«, a step- 
wise procedure can lead to an maecurate model. One of the most eílectve met 
to ensure performance of a stepwise procedure is to cross-evahdate the mic 
These ¡issues are discussed my miore depth am guedi [Ic PAUN 


lf a variable 1s a linear combination of other variables already in the model, then 
it wil not be added to the model in the stepwise SAS LOGIST procedures. 


Finally. a SAS LOGIST NOFIT procedure ts usedas a diagnostic tool prior aie 
fitting of models using stepwise procedures. This procedure tests the null hypoth- 
esis that all regression coefficients are zero. The NOFIT option is useful in finding 
out if anv modeling 1s worth while at all. 


The above are sununarized from Judge [Ref. 35: pp. 425-436], Nerlove [et sie 


14-22]. aned Hlarrell [Ref. 53: pp. I 200 
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APPENDIX J. CLUSTER ANALYSIS RESULTS 


Metis appendix gives the vesuts of the clustering of cells, which ts described in 
Chapter Y. The soldier population is first partitioned into 1080 cells, and then in a two 
step procedure this number 1s reduced to thirty-six cells. The assumption is that each 
of these cells is a grouping of soldiers with a similar probability of reenlisting. The as- 
sumption 1s tested in this appendix, using a non-parametric goodness-of-fit test. 

The cells are coded to identify which groups of soldiers belong to them. The coding 
ne seven Variables used to define the cells. Those variables (in the order in which 
thev appear in the coding) are as follows: 

Term of Enlistment 
e Sex 

e Rank 

e Dependents 

Nace 

e Region 


e Job Ivpe 


The number in each position of the coding represents the categorv of the variable re- 


presented. The possible categories for each variable are: 


rem ol Enlistment (Pe al, rec Or DOI s ears) 

SEN (]-male. 2-femalc) 

e Rank (3-3 or below, 4-E4, 5-L5 and above) 

e Dependents (1-no dependents, 2-married or smele with dependents) 
e Race (1-white, 2-black, 3-other) 

e Region (l-northeast, 2-mid-atlantic, 5-south, 7J-midwest, 8-west) 
© Job Type (1-low, 2-medium, 3-high civihan opportunity) 


An asterisk in the coding means that the given all categories in the given variable are 
combined, plus all categories of all remaining variables in the hierarchical structure are 
combined. Two numbers with parentheses around them represent two categories 


grouped together. 
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Three examples of this coding scheme are provided. The first. im Equation 32, re- 
presents all soldiers who enlisted for three or more Vvears, are male, are of rank. E4, with 
dependents, are of a ethnic group of other than white or black, are from the south, and 


arc in an MOS that provides a medium level of ervihian opportunity. 
yulda (32) 


The coding of Equation 33 represents all soldiers who enlisted for two vears and are fe- 
male. (The asterisk means that the cell contains soldiers tn all categories of the variables 
RANK, DEPENDENTS, RACE, REGION and JOB TYPE.) 


tJ 


y (33) 


The coding of Equation 34 represents all soldiers who enlisted Tor two vears, arc male, 


are of rank E3. and arc either black or in the other ethnic code classification. 


Tables 10 and 11 give the composition of each cell. 

Figures 20 and 21 give the cxpected rcenlistïment ratc for each of the 30 cellea 
the nuniber of observations of a sample of 75.778 total. 

We now test the assumption that a cell 1s a grouping of soldiers with a similar 
probability of reenhstment. To do this, we use the validation data we have been saving. 
A chi-square goodness-of-fit test is preformed. testing the assumed distribution function 
on each cell of the validation data. The hypothesis 1s that the observations in a given 
cell are distributed Binomial (a, p) where p is the estimated reenlistment rate given in 
Figures 20 and 21. In the test statistic in Equation 32, O; is the observed Numb 
soldiers reenlisting. O, is the observed number of soldiers leavine the service, SMS 
expected number of soldiers reenlisting, and £, is the expected number of soldiers leav- 


Tu) O 


(O EY 
= Se (35) 
Es 
The decision rule ts to reject 71, 1f 7 is ereaver thai. the (l1—c) quantile of eine 
square random vatiable with | degreevol freedoms in this test. 1,-.=.541 foro =O 


and .\,_, = 10.83 foro =V.001. Figures 20 and 21 list the reenlistment rate Tormine en 


validation cell and the 7 statistic for each cell. For any goodness-of-fit test. the null 
Pe apotiiesis is rejected if the sample size 1s allowed to get large enough [Ref 27: pp. 
190-191]. Cells 15 and 55 show this, as thev are cells with larger sample sizes. and 
moderate differences in probability (less than one percent), vet they have large 7 statis- 
Mes Therefore, even though some ol the tests reject the null hypothesis. the overall ef- 
fect of the chi-square test is to confirm the distributional assumptions of the cells. 
Therefore. we conclude that we have partitioned the population into cells of soldiers with 


similar reenlistment probabilities. 


Table 10. CLUSTER RESULTS BY ZONE 


pep AA 
| 


Cell 16 
E Y 
ENIS 


Cell S$ 
| 
Cell 13 E 


(cont) 
T S SE 3152122 | 
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Table 11. CLUSTER RESULTS BY ZONE (CONTINUED) 







Cell 38 oie 
Cella? 


Cell 4] 
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MODEL BUILDING DATA VALIDATION DATA 


A 


CELL SAMPLE PERCENT SAMPLE PERCENT 


SIZE REENLISTING SIZE REENLISTINC 
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43 
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Figure 20. 
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Number of Observations and Reenlistment Rates by Cell 


MODEL BUILDINC DATA VALIDATION DATA 


CELL SAMPLE PERCENT SAMPLE FERCENT 
REENLISTINC SIZE REENLISTING 


15960033 7953247 
«600000 ALO Se 
~647376 76695835 
.404270 «402346 
og ~459476 Y63117 
66 . 206540 -22835 
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Figure 21. Number of Observations and Reenlistment Rates by Cell (Continued) 
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APPENDIX K. REGRESSION ANALYSIS RESULTS 


The purpose of this appendix is to present the regression analysis results for each 
cell. A stepwise logistic regression procedure estimates the coefficients. A description 
of the method of inclusion of variables appears in Appendix I. Except for the intercept 
terms, all cocíficients are significant at the « = 0.05 level. Those intercepts terms for 
which « > 0.05 are marked with a double asterisk. Estimates with a single asterisk are 
samiicant at the o = 0.01 level. Table 12 and Table 13 list the results: 

Mie results are the transformed coellicicent estimates. To compute the actual 


reenlistment rates, use Fquation 35. where f is the vector of estimates, and X is the 


E ] A 
J- | eT- | a 


The variables labels of the tables are as follows: 


vector of variables observations. 


e Inter INTERCEPT 

e Var | BONUS LEVEL 

ear 2 REENLISTMENT SYSTEM 
“ar 3 AFOT SCORE 

e Vard PROMOTION RATE 

e Wars PAY RATE 

e War 6 AGE AT ENTRY 

e Var 7 UNEMPLOYMENT RATE 


Bie yy LOY NIENT RATE is not listed on chart. Only two cells include this vartable 
and results are listed here. Cell 52 includes the variable UNEMPLOYMENT RATE 
With a coefficient estimate of 0.105. It is significant at the o = 9.01 level. Cell 73 in- 
cludes the variable UNEMPLOYMENT RATE with a coefficient estimate of -0.036. It 
isssenilicant at the « = 0.01 level. The R values are listed under the cell number for each 


cell. 
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Table 12. REGRESSION RESULT TDR ONE 


Cells Inter Marl! Var 2 ate Var 4 Yar varg 
(R Val 





Cell 1 
(0.095) 






MOSS 


| i pe 5 | 5 l T a 
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i | | | l o e Tle 
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als) 
Gel - 1.097 209 OS 0.057 
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(0.10) sh th 
cC e1. 004 OO TS TE 0.040 + 
MS) 
Cell 2s 0.940 * 0.179 * -0.010 * | -0.025 * 
(0.144) 
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Table 13, REGRESSION RESULTS BY ZONE (CONTINUED) 
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Cell 39 -0.239 0.055 DTS 
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Copa OO? 32° 
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0) 
0.022 OS. 
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